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A VARIATIONAL APPROACH TO THE CONSISTENCY OF SPECTRAL 

CLUSTERING 

NICOLAS GARCIA TRILLOS AND DEJAN SLEPCEV 


Abstract. This paper establishes the consistency of spectral approaches to data clustering. We 
consider clustering of point clouds obtained as samples of a ground-truth measure. A graph 
representing the point cloud is obtained by assigning weights to edges based on the distance 
between the points they connect. We investigate the spectral convergence of both unnormalized 
and normalized graph Laplacians towards the appropriate operators in the continuum domain. We 
obtain sharp conditions on how the connectivity radius can be scaled with respect to the number 
of sample points for the spectral convergence to hold. We also show that the discrete clusters 
obtained via spectral clustering converge towards a continuum partition of the ground truth 
measure. Such continuum partition minimizes a functional describing the continuum analogue of 
the graph-based spectral partitioning. Our approach, based on variational convergence, is general 
and flexible. 


1. Introduction 

Clustering is one of the basic problems of statistics and machine learning: having a collection 
of n data points and a measure of their pairwise similarity the task is to partition the data into k 
meaningful groups. There is a variety of criteria for the quality of partitioning and a plethora of 
clustering algorithms, overviewed in mmmm- Among most widely used are centroid based (for 
example the k -means algorithm), agglomeration based (or hierarchical) and graph based ones. Many 
graph partitioning approaches are based on dividing the graph representing the data into clusters of 
balanced sizes which have as few as possible edges between them Spectral 

clustering is a relaxation of minimizing graph cuts, which in any of its variants, □ID EED 03 , consists 
of two steps. The first step is the embedding step where data points are mapped to a euclidean space 
by using the spectrum of a graph Laplacian. In the second step, the actual clustering is obtained 
by applying a clustering algorithm like /c-means to the transformed points. 

The input of a spectral clustering algorithm is a weight matrix W which captures the similarity 
relation between the data points. Typically, the choice of edge weights depends on the distance 
between the data points and a parameter e which determines the length scale over which points are 
connected. We assume that the data set is a random sample of an underlying ground-truth measure. 
We investigate the convergence of spectral clustering as the number of available data points goes to 
infinity. 

For any given clustering procedure, a natural and important question is whether the procedure is 
consistent. That is, if it is true that as more data is collected, the partitioning of the data into groups 
obtained converges to some meaningful partitioning in the limit. Despite the abundance of clustering 
procedures in the literature, not many results establish their consistency in the nonparametric 
setting, where the data is assumed to be obtained from a unknown general distribution. Consistency 
of fc-means clustering was established by Pollard m- Consistency of k -means clustering for paths 
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with regularization was recently studied by Thorpe, Theil and Cade |44| . using a similar viewpoint 
to those of this paper. Consistency for a class of single linkage clustering algorithms was shown 
by Hartigan [22]. Arias-Castro and Pelletier have proved the consistency of maximum variance 
unfolding [3]- Pointwise estimates between graph Laplacians and the continuum operators were 
studied by Belkin and Niyogi [8] , Coifman and Lafon m, Gine and Koltchinskii m, Hein, Audibert 
and von Luxburg [23], and Singer m- Spectral convergence was studied in the works of Ting, 
Huang, and Jordan [45], Belkin and Niyogi [7] on the convergence of Laplacian eigenmaps, von 
Luxburg, Belkin and Bousquet on graph Laplacians, and of Singer and Wu [3B] on connection 
graph Laplacian. The convergence of the eigenvalues and eigenvectors these works obtain is of 
great relevance to machine learning. However obtaining practical and rigorous rates at which the 
connectivity length scale e n — > 0 as n — > oo remained an open problem. Also relevant to point cloud 
analysis are studies of Laplacians on discretized manifolds by Burago, Ivanov and Kurylev |10j who 
obtain precise error estimates for eigenvalues and eigenvectors. 

Recently the authors in EE and together with Laurent, von Brecht and Bresson in[T7], intro¬ 
duced a framework for showing the consistency of clustering algorithms based on minimizing an 
objective functional on graphs. In m they applied the technique to Cheeger and Ratio cuts. Here 
the framework of mm is used to prove new results on consistency of spectral clustering, which es¬ 
tablish the (almost) optimal rate at which the connectivity radius e can be taken to 0 as n —> oo. We 
prove the convergence of the spectrum of the graph Laplacian towards the spectrum of a correspond¬ 
ing continuum operator. An important element of our work is that we establish the convergence of 
the discrete clusters obtained via spectral clustering to their continuum counterparts. That is, as 
the number of data points n oo the discrete clusters (obtained via spectral clustering) are show 
to converge towards continuum objects (measures), which themselves are obtained via a clustering 
procedure in the continuum setting (performed on the ground truth measure). That is, the discrete 
clusters are shown to converge to continuum clusters obtained via spectral clustering procedure with 
full information (ground truth measure) available. We obtain results for unnormalized (Theorem 
Oil . and normalized (Theorems 11.51 and 11.71) graph Laplacians. The bridge connecting the spec¬ 
trum of the graph Laplacian and the spectrum of a limiting operator in the continuum is built by 
using the notion of variational convergence known as T-convergence. The setting of T-convergence, 
combined with techniques of optimal transportation, provides an effective viewpoint to address a 
range of consistency and stability problems based on minimizing objective functionals on a random 
sample of a measure. 

1.1. Description of spectral clustering. Let V = {xi,... ,x n } be a set of vertices and let 
W £ R nxn be a symmetric matrix with non-negative entries. We define V £ K raxn , the degree 
matrix of the weighted graph (V, W), to be the diagonal matrix with T>a = y~V for every i. 
Also, we define L, the unnormalized graph Laplacian matrix of the weighted graph (V, W), to be 

(1.1) L := D — W. 

We also consider the matrices N sym and N rw given by 

N sym v -l/2 LV -l/2, N rw £>- 1 ^ 

both of which we refer to as normalized graph Laplacians. The superscript sym indicates the fact 
that N sym is symmetric, whereas the superscript rw indicates the fact that N rw is connected 
to the transition probabilities of a random walk that can be defined on the graph. Each of the 
matrices L, N sym , N rw is used in a version of spectral clustering. The so called unnormalized 
spectral clustering uses the spectrum of the unnormalized graph Laplacian to embed the point 
cloud into a lower dimensional space, typically a method like fc-means on the embedded points then 
provides the desired clusters (see [47]). This is Algorithm |T] below. 
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Algorithm 1 Unnormalized spectral clustering 

Input: Number of clusters k and similarity matrix W. 

- Construct the unnormalized graph Laplacian L. 

- Compute the eigenvectors u±, ..., Uk of L associated to the k smallest (nonzero) eigenvalues 
of L. 

- Define the matrix U £ R fcxn , where the *-th row of U is the vector Ui. 

- For * = 1,..., n, let y, £ be the i-th column of U. 

- Use the fc-means algorithm to partition the set of points {y \,..., y n } into k groups, that we 
denote by Gi,..., G& ■ 

Output: Clusters Gi,..., G&. 


In the same spirit, the normalized graph Laplacians are used. An algorithm for normalized 
spectral clustering using N sym was introduced in [29] (see Algorithm [2J , and an algorithm using 
N rw was introduced in [55] (see Algorithm [5]). 


Algorithm 2 Normalized spectral clustering as defined in [59] 

Input: Number of clusters k and similarity matrix W. 

- Construct the normalized graph Laplacian N syrn . 

- Compute the eigenvectors u\,... ,Uk of N sym associated to the k smallest (nonzero) eigen¬ 
values of N sym . 

- Define the matrix U £ R fex ", where the i-th row of U is the vector Ui. 

- Construct the matrix V by normalizing the columns of U so that the columns of V have all 
euclidean norm equal to one. 

- For i = 1,..., n, let yi £ be the i- th column of V. 

- Use the fc-means algorithm to partition the set of points {y i,..., y n } into k groups that we 
denote by Gi,..., Gk ■ 

Output: Clusters Gi,..., Gk- 


Algorithm 3 Normalized spectral clustering as defined in [35] 

Same as Algorithm 1 but using the normalized graph Laplacian N rw instead of L. 


Spectral properties of graph Laplacians have connections to balanced graph cuts. For example, 
the spectrum of N rw is shown to be connected to the Ncut problem, whereas the spectrum of L 
is connected to RatioCut (see [47] )■ A probabilistic interpretation of the spectrum of N rw may be 
found in [281 . In addition, connections between normalized graph Laplacians, data parametrization 
and dimensionality reduction via diffusion maps are developed in [55]. 

We now present some facts about the matrices L,N sym and N rw , all of which may be found in 
m■ First of all L is a positive semidefinite symmetric matrix. In fact for every vector u £ M" 

(1.2) (Lu,u) = 

ij 

where on the left hand side we are using the usual inner product in The smallest eigenvalue 
of L is equal to zero, and its multiplicity is equal to the number of connected components of the 
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weighted graph. The matrix N sym is symmetric and positive semidefinite as well. Moreover, for 
every u £ R” 

(L3) (»»»„,„) = i £ W,„ (f;‘. 

' L i3 

In addition, 0 is an eigenvalue of N sym , with multiplicity equal to the number of connected compo¬ 
nents of the weighted graph. The vector V 1 / 2 1 (where 1 is the vector with all entries equal to one) 
is an eigenvector of N sym with eigenvalue 0. 

The two forms of normalized graph Laplacians are closely related due to the correspondence 
between the spectruma of N sym and N rw . In fact, it is straightforward to show that 

(1.4) N rw u = Xu if and only if N sym w = A w, where w = V l ^ 2 u. 

That is, N sym and N rw have the same eigenvalues, and there is a simple relation between their 
corresponding eigenvectors. 


1.2. Spectral clustering of point clouds. Let V = {aq,..., x n } be a point cloud in R d . To give 
a weighted graph structure to the set V, we consider a kernel r), that is, we consider ry : M. d —y [0, oo) 
a radially symmetric, radially decreasing function decaying to zero sufficiently fast. The kernel is 
appropriately rescaled to take into account data density. In particular, let r) E depend on the length 
scale e where we take g e : R d —> R to be defined by 


’'•w : = ?’) (J) • 


In this way we impose that significant weight is given to edges connecting points up to distance e. 
We consider the similarity matrix W e defined by 


(1.5) 


-Xj). 


We denote by £„ )E the unnormalized graph Laplacian m of the weighted graph (V, W e ), that is 

( 1 . 6 ) 


Cn, S =V e -W e 


where V s is the diagonal matrix with "Df ?; = ]Tb Wfj. 

We define the Dirichlet energy on the graph of a function u : V 


to be 


(1.7) 


J2 W i,j( U (Xi) - uixj)) 2 . 


The fact that ry is a symmetric function guarantees that W is symmetric and thus all the facts 
presented in Subsection [IT] apply. In particular, CL2) can be stated as: for every function u : V —>■ R 

(1.8) (C n ,eU,u) = ~ u ( x j)) 2 ’ 


where on the left hand side we have identified the function u with the vector (u(x i),... ,u(x n )) in 
R” and where (•, •) denotes the usual inner product in R”. 

The symmetric normalized graph Laplacian A/"^ y e m is given by 

Since the kernel rj is assumed radially symmetric, it can be defined as rj(x) := ry( |a;|) for all 
x £ R rf , where r) : [0, oo) —► [0, oo) is the radial profile. We assume the following properties on rj: 
(Kl) t,(0) 0 Q/iid. tj is continuous nt 0. 

(K2) Ty is non-increasing. 

(K3) The integral rj(r) r d + 1 dr is finite. 
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Remark 1.1. We remark that the last assumption on r/ is equivalent to imposing that the surface 
tension 

(1.9) a r) := f r](h)\hi\ 2 dh 

JR d 

is finite, where hi represents the first component of h. The second condition implies that more 
relevance is given to the interactions between points that are close to each other. We notice that 
the class of acceptable kernels is quite broad and includes both Gaussian kernels and discontinuous 
kernels like one defined by a function r) of the form r/ = 1 for t < 1 and r/ = 0 for t > 1. 


We focus on point clouds that are obtained as independent samples from a given distribution v. 
Specifically, consider an open, bounded, and connected set D C R d with Lipschitz boundary (i.e. 
locally the graph of a Lipschitz function) and consider a probability measure v supported on D. 
We assume v has a continuous density p, which is bounded above and below by positive constants 
on D. We assume that the points x\,...,x n (i.i.d. random points) are chosen according to the 
distribution u. We consider the graph with nodes V = {x \,..., x n } and edge weights {Wfjj. . 
defined in m- For an appropriate scaling of e := e n with respect to n, we study the limiting 
behavior of the eigenvalues and eigenvectors of the graph Laplacians as n —> oo. We now describe 
the continuum problems which characterize the limit. 


1.3. Description of spectral clustering in the continuum setting: the unnormalized case. 

Let domain D , ’’ground-truth” measure v with density p be as above. The object that characterizes 
the limit of the graph Laplacians £n, E „ as n — > oo is the differential operator: 

(1.10) -div(p 2 Vu). 

P 

We consider the pairs A S R and u £ H X (D) (the Sobolev space of L 2 (D) functions with distribu¬ 
tional derivative Vu in L 2 (ZA,R d )), with u not identically equal to zero, such that 


( 1 . 11 ) 


Cu = Xu, 



in D, 
on dD. 


A function u as above is said to be an eigenfunction of C with corresponding eigenvalue A £ R. In 
Subsection 12.41 we discuss the precise definition of a solution of (11.111) and present some facts about 
it. In particular £ is a positive semidefmite self-adjoint operator with respect to the inner product 
(•, •) l 2 (d,v) and has a discrete spectrum that can be arranged as an increasing sequence converging 
to infinity 

0 = Ai < A 2 < ..., 


where each eigenvalue is repeated according to (finite) multiplicity. Furthermore, there exists a or¬ 
thonormal basis of L 2 (D) (with respect to the inner product (•, •) l 2 (d,v)) consisting of eigenfunctions 
Ui of C. 

Given a mapping *I> : D —> by we denote the push forward of the measure v, namely 

the measure for which <hjjj fA) = u($ -1 (A)), for any Borel set A. The continuum spectral clus¬ 
tering analogous to the discrete one of Algorithm 1 is as follows. Let ui,..., Uk '■ D —> R be the 
orthonormal set of eigenfunctions corresponding to eigenvalues Ai,...,Afc. Consider the measure 
p = (ui ,..., rtfc)|ju. Let Gi C R fc be the clusters obtained by k-means clustering of p. Then 
Gi = (ui ,..., Ufe)” 1 (G'i) for i = 1,..., k define the spectral clustering of v. 
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1.4. Description of spectral clustering in the continuum setting: the normalized cases. 

The object that characterizes the limit of the symmetric normalized graph Laplacians Mj y l n as 
n —> oo is the differential operator 


j\fsym 


M4 — 


, 3/2 


div 



We consider the space 

(1.12) Hy p {D) := |u £ L\D) : £ H\D)} . 

The spectrum of J\f syrn is the set of pairs r £ R. and u £ where u is not identically equal 

to zero, such that 


(1.13) 


M SVm {u) = TU, 

dN^p) = 

dn 


in D 
on dD. 


The sense in which (11.131) holds is made precise in Subsection 12.41 The spectrum of the operator 
j\f sy m h a s s i m ilar properties to those of the spectrum of C. We let 


0 = TL < T 2 < . . . , 

denote the eigenvalues of Af sym , repeated according to multiplicity. 


The continuum spectral clustering analogous to the discrete one of Algorithm 2 is as follows. Let 
ui,... ,Uk '■ D —> R be the orthonormal set of eigenfunctions (with respect to the inner product 
(•, •)l 2 (d,j/)) corresponding to eigenvalues Ti,..., t*,. Normalize them by 


(ih{x),... ,u k (x)) 


l| ( ,"fi-'-"fiill for all x € D. 


Consider the measure jl = (hi,..., Uk)$v. Let Gi C K fc be the clusters obtained by k-means 
clustering of jl. Then Gi = (hi,..., Uk)~ l {Gi) for i = 1,..., k define the spectral clustering o v. 


Finally, the operator that describes the limit of the graph Laplacians Af™ = V en 1 C, 1>£n is 
described by the operator J\f rw : 

M rw {u) =-4div(p 2 Vu). 

P 

As discussed in Subsection 12.41 the eigenvalues of Af rw are equal to the eigenvalues of M sym . The 
continuum clustering, which is analogous to the discrete one of Algorithm 3, is as in Subsection ll.31 
where eigenfunctions of N rw are used. 


1.5. Passage from discrete to continuum. We are interested in showing that as n —> oo eigen¬ 
values of discrete graph Laplacians and the associated eigenvectors converge towards eigenvalues and 
eigenfunctions of corresponding differential operators. The issue that arises is how to compare func¬ 
tions on discrete and continuum setting. Typically this is achieved by introducing an interpolation 
operator that takes discretely defined functions to continuum ones and a restriction operator which 
restricts the continuum function to the discrete setting. For this setting to work some smoothness 
of functions considered is required. Furthermore the choice of the interpolation operator and its 
properties adds an intermediate step that needs to be understood. 

We choose a different route and introduce a way to compare the functions between settings 
directly. This approach is quite general and does not require any regularity assumptions. We use 
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the TL p -topologies introduced in m and in particular in this paper we focus in the TL 2 -topology 
that we now recall. Denote by v n the empirical measure associated to the n data points, that is 


(1.14) 


:= - 
n 


;=i 


For a given function u £ L 2 (D, v), the question is how to compare u with a function v £ L 2 (D,is n ) 
(a function defined on the set V). More generally, one can consider the problem of how to compare 
functions in L 2 (D, / i ) with those in L 2 (D, 9) for arbitrary probability measures p, 9 on D. We define 
the set of objects that includes both the functions in discrete setting and those in continuum setting 
as follows: 

TL 2 (D) := {(M, /) : d e V{D\ f € L 2 (D, p)}, 

where V{D) denotes the set of Borel probability measures on D. For (p, f) and (0,g) in TL 2 we 
define the distance 

((//,/), (0,ff)) = inf ([[ \x-y\ 2 + \f(x) - g(y)\ 2 dTr(x,y)) , 

7rer(/x,e) \J J DxD ) 

where T(p,9) is the set of all couplings (or transportation plans) between p and 9, that is, the set 
of all Borel probability measures on D x D for which the marginal on the first variable is p and the 
marginal on the second variable is 9. It was proved in m that cItl 2 is indeed a metric on TL 2 . 
As remarked in m, one of the nice features of the convergence in TL 2 is that it simultaneously 
generalizes the weak convergence of probability measures and the convergence in L 2 of functions. It 
also provides us with a way to compare functions which are supported in sets as different as point 
clouds and continuous domains. In Subsection 12.II we present more details about this metric. 

For a given p £ V(D) we denote by L 2 (p) the space of L 2 -functions with respect to the measure 
p. Also, for f,g £ L 2 (p) we write 


[ fg d d and \\f\\ 2 = 

J D 

Finally, if the measure p has a density p, that is, if dp = pdx, we may write (/, g) p and ||/|| p instead 
of (/,S> M and || f\\ p . 


1.6. Convergence of eigenvalues, eigenvectors, and of spectral clustering: the unnor¬ 
malized case. Here we present one of the main results of this paper. We state the conditions 
on e„ for the spectrum of the unnormalized graph Laplacian £„ )£n , given in (11.61) . to converge to 
the spectrum of £, given by (11.101) and for the spectral clustering of Algorithm 1 to converge to 
the clustering of Subsection 11,31 Let Ai < A 2 < ■ ■ ■ be the eigenvalues of C and m,U 2 ,... the 
corresponding orthonormal eigenfunctions, as in Subsection 11.31 We recall that orthogonality is 
considered with respect to the inner product in L 2 {y). 

To state the results it is convenient to introduce 0 = Ai < A 2 < • ■ ■ , the sequence of distinct 
eigenvalues of C. For a given k £ N, we denote by s{k) the multiplicity of the eigenvalue A k and 
we let k £ N be such that = A^ +1 = ■ • • = A^ +S ^ fc %. Also, we denote by Proj fc : L 2 (v) —> L 2 (v) 
the projection (with respect to the inner product (•,•)„) onto the eigenspace of C associated to 
the eigenvalue A k- For all large enough n, we denote by Proj^ : L 2 (v n ) -A L 2 (v n ) the projection 
(with respect to the inner product (•, -) Vn ) onto the space generated by all the eigenvectors of £ n ,e„ 
associated to the eigenvalues \~ ,..., \~ . Here, as in the rest of the paper, we identify M” 

with the space L 2 (D 1 i' n ). 
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Theorem 1.2 (Convergence of the spectra of the unnormalized graph Laplacians). Let d > 2 and 
let D C be an open, bounded, connected set with Lipschitz boundary. Let v be a probability 
measure on D with continuous density p, satisfying 

(1.15) (\/x £ D) m < p(x) < M, 


for some 0 < m < M. Let x \,..., x n ,... be a sequence of i.i.d. random points chosen according to 
v. Let {£n} ng pj be a sequence of positive numbers converging to 0 and satisfying 


(1.16) 


(log n) 3/4 
n 1 / 2 e, 


lim 

n—>■ oo 


(log n) 1 ^ 1 
n x l d £ n 


0 ifd = 2, 


0 if d> 3. 


Assume the kernel rj satisfies conditions (K1)-(K3). Then, with probability one, all of the following 
statements hold true: 


1 . Convergence of Eigenvalues: For every k £ N 

o\W 

(1.17) lim —= a v Xk, 

n—too n£n 

where a ^ is defined in & 

2. For every k £ N, every sequence {it]I} ngN with u% an eigenvector of C n ,e„ associated to 

the eigenvalue Xand with ||u^||y n = 1 is pre-compact in TL 2 . Additionally, whenever 
TL 2 

u]l: —> Uk along a subsequence as n —> oo, then ||Mfe||jy = 1 and uu is an eigenfunction of C 
associated to X k- 

3. Convergence of Eigenprojections: For all k £ N and for arbitrary sequence v n £ L 2 (y n ), if 

TL 2 

v n —> v as n —^ oo along some subsequence. Then along that subsequence 
Projj, n) (i'n) Proj fc («), as n -> oo. 

4. Consistency of Spectral Clustering. Let G r f,...G r f. be the clusters obtained in Algorithm 
1. Let v™ = u n \_qu (the restriction of v n to G ") for i = 1,..., k. Then (i/" ; ..., v%) is 
precompact with respect to weak convergence of measures and furthermore if [y'f, ..., vff) 
converges along a subsequence to [y\ ,...,Vk) then (i/± ,...,Vk) = (^Gir-'iAGj where 
Gi,..., Gk is a spectral clustering of v, described in Subsection \1.!A 


Remark 1.3. We remark that although the choice of the TL 2 -topology used in the previous theorem 
may seem unusual at first sight, it actually reduces to a more common notion of convergence (like 
the one used in [48] which we described below) in the presence of regularity assumptions on the 
density p and the domain D. In fact, assume for simplicity that D has smooth boundary and that 
p is a smooth function. Consider {u^} ragN where uf. is an eigenvector of C n , erl associated to the 

eigenvalue X^' 1 and satisfying ||uj;[|„„ = 1. The second statement in Theorem 11.21 savs that up to 
TL 2 

subsequence, u% —> Uk , where Uk is an eigenfunction of C associated to Xk- From the regularity 
theory of elliptic PDEs it follows that Uk is smooth up to the boundary. In particular, it makes 
sense to define a function u £ on the point cloud, by simply taking the restriction of Uk to the points 

___ TZ / 2 

{aq,... ,x n }. It is straightforward to check that uJf. —> Uk due to the smoothness of Uk- In turn, 

T L 2 i 

u% —> Uk, implies that dxL 2 {{^n, u]( — «£), {v, 0)) —> 0. From this and Proposition ^. II we conclude 

that 

This is precisely the mode of convergence used in [45] . 
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The proof of Theorem 11.21 relies on the study of the limiting behavior of the following rescaled 
form of the Dirichlet energy (O) on the graph: 

(1.18) G„, e (u) := Wij(u(xi ) - u(xj)) 2 . 


The type of limit which is relevant for the problem, is the one given by variational convergence 
known as the T-convergence. The notion of T-convergence is recalled in Subsection l2.2l This notion 
of convergence is particularly suitable in order to study the convergence of minimizers of objective 
functionals on graphs as n —> oo, as it is discussed in E2- 

The relevant continuum energy is the weighted Dirichlet energy G : L 2 (D) —> [0,oo]: 


(1.19) 


G(u) 


Id\V u(x)\ 2 p 2 (x)dx 
oo 


if u £ H 1 (D) 

if uG L 2 (D)\H\D). 


Theorem 1.4 (T-convergence of Dirichlet energies). Consider the same setting as in Theorem \l.‘2\ 
and the same assumptions on rj and on {e n } n6 N- Then, G„ >£n , defined by (11.1811 . T-converge to 
a v G as n —> oo in the TL 2 sense, where a v is given by (EH) and G is the weighted Dirichlet 
energy with weight p 2 defined in (11.191) . Moreover, the sequence of functionals {G ra)£n } n6N satisfies 
the compactness property with respect to the TL 2 -metric. That is, every sequence with 

u n G L 2 (v n ) for which 

SUp \\u n \\„ n < OO, SUp G n>£n (u n ) < OO, 
nGN N 

is precompact in TL 2 . 


The fact that the weight in the limiting functional G is p 2 (and not p) essentially follows from 
the fact that the graph Dirichlet energy defined in (11.181) is a double sum. This is the same weight 
that shows up in the study of the continuum limit of the graph total variation in |15j . Theorem EH 
is analogous to Theorems 1.1 and 1.2 in m combined. 


1.7. Convergence of eigenvalues, eigenvectors, and of spectral clustering: the normal¬ 
ized case. We also study the limit of the spectra of the symmetric normalized graph Lapla- 

cian which we recall is given by, 

K y Z ■= v- 1 ' 2 c n , en v~v 2 . 

For a function u : V —> R, EH) can be written as 


( 1 . 20 ) 





f u{Xj) _ u(Xj) \ 

\ V^u y/T>jj J 


We denote by 

0 = r[ n) < ■ ■ ■ < 

the eigenvalues of repeated according to multiplicity. Their limit is described by differential 

operator 

Let 


0 = n < r 2 < ..., 

denote the eigenvalues of J\f syrn , repeated according to multiplicity. We write 0 = ri < t 2 < ..., to 
denote the distinct eigenvalues of C. For a given k £ N, we denote by s(k) the multiplicity of the 
eigenvalue rfc and we let k G N be such that tj- = r^ +1 = • • • = t^ + s ^ . We define Proj fc and Proj), 
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analogously to the way we defined them in the paragraph preceding Theorem 11.21 The following is 
analogous to Theorem ll.2l 


Theorem 1.5 (Convergence of the spectra of the normalized graph Laplacians). Consider the same 
setting as in Theorem \1.2\ and the same assumptions on p and on {£ ra } n6 N- Then, with probability 
one, all of the following statements hold 

1. Convergence of Eigenvalues: For every k € N 


lim 

n—> oo 



(Jri 

&T fc ’ 


where j3 v is given by 


(1.21) fS^ := f g(h)dh , 

jR d 

and where a v is given by (EHD- 

2. For every k gN, every sequence {R/)} ngN with u'J. being an eigenvector of Aff 11 ™ associated 

to the eigenvalue t^ and with ||wfe||iy„ = 1 is pre-compact in TL 2 . Additionally, whenever 
TI/2 

u]l: —> Uk along a subsequence as n —»■ oo, then ||u/c||i/ = 1 and Uk is an eigenfunction of Af 
associated to Tk■ 

rj~i 

3. Convergence of Eigenprojections: For all k G N and for arbitrary v n G L 2 (y n ), if v n —> v 
along a subsequence as n —> oo then, 

Proj^^Mn) —A Proj fc (it), as n —»■ oo, along the subsequence. 

4. Consistency of Spectral Clustering. Assume p G C 1 (D). Let G",...G]J be the clusters 
obtained in Algorithm 2. Let v™ = v n \_ g™ for i = 1,..., k. Then (yf ,..., vj?) is precompact 
with respect to weak convergence of measures and furthermore if (i/™,..., i/jj?) converges 
along a subsequence to (yi,..., Vk) then (y \,..., v\f) = (^ l Gh ■ ■ •, i GG k ) where G i ,... ,Gk is 
a spectral clustering of v, described in Subsection \l.f\ 


The proof of the previous theorem is completely analogous to the one of Theorem 11.21 once one 
has proved the variational convergence of the relevant energies. Indeed, consider G„ >£ : L 2 {y n ) —► R 
defined by 


( 1 . 22 ) 


G„, e (u) 



( u { x i) _ u ( x j) \ 


and G : L 2 (D) —► [0, oo] by 


(1.23) 


G(u) 



if uGHy p (D), 
if uGL 2 (D)\Hy p (D), 


where H^(D) is defined in (11.1211 . The following holds. 


Theorem 1.6 (T-convergence of normalized Dirichlet energies). With the same setting as in The- 
orem ITTSl and the same assumptions on r) and on {£ ra } n6 p^ G n<en , defined by (11.221 1 . T-converge 
to jy-G as n -A oo in the TL 2 -sense, where G is defined in (11.2311 . a v and (3 V are given by Ob 
and (11.211) respectively. Moreover, the sequence of functionals {G„ )£n } N satisfies the compactness 
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property with respect to the TL 2 -metric. That is, every sequence {Mn} ra6N with u n £ L 2 (i/ n ) for 
which 

sup || u n \\ Vn < oo, sup G n:Eri (u n ) < oo, 

n6N nGN 

is precompact in TL 2 . 

Finally, we consider the limit of the spectrum of Af™ , where N™ = V~ 1 C Ut£n . Consider the 
operator M rw given by 

M rw {u) =- 4 div(p 2 Vw). 

P 

As discussed in Subsection 12.41 the eigenvalues of J\f rw are equal to the eigenvalues of J\f sym . Thus 
from m and from Theorem H751 it follows that after appropriate rescaling, the eigenvalues of 
A/"™ converge to the eigenvalues of Af rw . Moreover, using again (11.41) and Theorem 11.51 we have 
the following convergence of eigenvectors. 

Corollary 1.7. Consider the same setting as in Theorem, \1.2\ and, the same assumptions on r) and 
on {£n} ng N- Then, with probability one, the following statement holds: For every k £ N, every 
sequence {u]J} neN with u k being an eigenvector of J\ff w en associated to the eigenvalue t^ and with 
ll u fclUn = 1 is pre-compact in TL 2 . Additionally, all its cluster points are eigenfunctions of Af rw 
with eigenvalue r k . Finally the clusters obtained by Algorithm 3 converge to clusters obtained by 
spectral clustering corresponding to M™ described at the end of Subsection \1.3\ 

1.8. Stability of k —means clustering. One of the final elements of the proof of the consistency 
results of spectral clustering (statement 4. in Theorems 1 1.2 1 and 1 1.5 1) requires new results on stability 
of fc-means clustering with respect to perturbations of the measure being clustered. These results 
extend the result of Pollard m who proved the consistency of fc-means clustering. It is important 
to extend such results because in our setting, at the discrete level, the point set used as input for 
the fc-means algorithm is not a sample from a given distribution and thus one can not apply the 
results in m directly. 

Given k £ N and given a measure p, on with finite second moments, let F llk : S. Nxk —> [0, oo) 
be defined by 

(1.24) F^ k {z\,...,Zk) := J d(x,{z!,...,z k }) 2 dfc(x) 

where Zi £ for i = 1,..., fc. For brevity we write z both for (zi ,..., z k ) and {z ±,..., z k } where 
the object considered should be clear from the context. The problem of fc-means clustering is to 
minimize F llk over M. Nxk . In Subsection 12.31 we show the existence of minimizers of the functional 
(11.241) . The main result is the following. 

Theorem 1.8 (Stability of fc-means clustering). Let k > 1. Let p be a Borel probability measure 
on R^ with finite second moments and whose support has at least k points. Assume {/K m } mgN 
is a sequence of probability measures on R w with finite second moments which converges in the 
Wasserstein distance (see ns; to p. Then, 

lim minFb fe (z) = miniA fc (z). 

m—*oo z z 

Moreover, if z m is a minimizer of F^^ for all m, then the set {z m ,m £ N} is precompact in R Arxfe 
and all of its accumulation points are minimizers of F^, k . 

We present the proof of the previous Theorem in Subsection 12.31 

The clusters corresponding to z minimizing the F IJik are the Voronoi cells: Gi = {x £ M ;V : 
d(x , z) = d(x, Zi)}. We prove in Lemma 12.101 that the measure of the boundaries of clusters is zero, 
that is we show that if i 7 ^ j then g,(Gi D Gf) = 0. In other words it is irrelevant to which cluster are 
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the points on the boundary assigned to and because of this, we are allowed to define the clusters to 
be either open or closed sets. We furthermore note that the associated measures, /j l g, : are mutually 
orthogonal and satisfy PlG* = h- 

A consequence of Theorem 11.81 is that as the cluster centers converge so do the measures repre¬ 
senting the clusters. 

Corollary 1.9. Under the assumptions of Theorem 1 1.81 if z m converge along a subsequence to z 
then the measures /r mjl _G m converge weakly in the sense of measures to p^Gi as m oo for all 
i = 1,..., k. 

The corollary follows from Theorem 11.81 since the convergence of centers of Voronoi cells, along 
with the fact that the boundaries of cells change continuously with respect to cell centers implies 
that the measures converge in Levy-Prokhorov metric, which characterizes the weak convergence of 
measures. 

1.9. Discussion. Theorems 11.21 fT75l and 11.71 establish the consistency of spectral clustering. An 
important difference between our work and the available consistency results is that we provide an 
explicit range of rates at which e n (the length scale used to construct the graph) is allowed to 
converge to 0 as n —>■ oo. In [48] the parameter e is not allowed to depend on n. As a result, 
the functional obtained in the limit is a non-local (i.e. integral, rather than differential) operator. 
Operators with very different spectral properties are obtained in the limit depending on whether 
one uses a normalized or unnormalized graph Laplacian. In particular, it is argued that normalized 
spectral clustering is more advantageous than the unnormalized clustering, because in the normalized 
case the spectrum of the limiting operator is better behaved and the spectral consistency in the 
unnormalized case is only guaranteed in restrictive settings. We remark that our results show that 
when the parameter e n decays to zero such difference between the normalized and the unnormalized 
settings disappears and the limiting operators in both cases have a discrete spectrum. 

When constructing the graph it is advantageous, from the point of view of computational com¬ 
plexity, to have fewer edges (that is to take e small). However below some threshold the graph 
thus constructed does not contain enough information to accurately recover the geometry of the 
underlying ground-truth distribution. How large e should be taken depends on n, the number of 
data points available. As number of data points increases e converges to zero. We remark that for 
d > 3, the results of Theorems 11.21 IT751 and 1 1.71 are (almost) optimal in the sense of scaling. Namely, 
we show that if the kernel rj used to construct the graph is compactly supported, then convergence 
holds if e n —, while if e n -C los ffid — the convergence does not hold. This follows from the 

results on the connectivity of random geometric graphs in H2CE1I3Q] which show that with high 
probability for large n the graph thus obtained is disconnected. 

Finally, we remark that our results are essentially independent of the kernel used to construct the 
weights. For example, when the points are sampled from the uniform distribution on a domain D , 
our results show that the spectra of the graph Laplacians converge to the spectrum of the Laplacian 
on the domain D , regardless of the kernel used. 

1.10. Outline of the approach. Theorem 11.21 is based on the variational convergence of the 

energies G nj£n towards o^G, together with the corresponding compactness result (Theorem II. 411 . In 
order to show Theorem 11.41 we first introduce the functional G En : L 2 (D,p) [0,oo) given by, 

(1.25) G en {u):=— / r] £n (x - y)\u(x) - u{y)\ 2 p(x)p(y)dxdy, 

£ n JD JD 

which serves as an intermediate object between the functionals G raj£n and G. It is important 
to observe that the argument of G n>£n is a function u n supported on the data points, whereas the 
argument of G £ri is a L 2 (D, p) function; in particular a function defined on D. The functional G £n is 
a non-local functional, where the term non-local refers to the fact that differences of a given function 
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on a ^-neighborhood are averaged, which contrasts the local approach of averaging derivatives of 
the given function. Non-local functionals have been of interest in the last decades due to their 
wide range of applications which includes phase transitions, image processing and PDEs. From a 
statistical point of view, for a fixed function u : D — > 1R, G En ( u ) is nothing but the expectation of 
G n ,e„(u). On the other hand, the functional G En is relevant for our purposes because not only it 
approximates G defined in (11.191) in a pointwise sense, but it also approximates it in a variational 
sense (as the parameter e„ goes to zero). More precisely the following holds. 

Proposition 1.10. Consider an open, bounded domain D in with Lipschitz boundary. Let 
p : D —> R be continuous and bounded below and above by positive constants. Le {efc} fcgN be a 
sequence of positive numbers converging to zero. Then, {G e(! } fcgN (defined in (U.25|l ) T-converges 
with respect to the L 2 (D , p)-metric to a v G, where a v is defined in (11.91) and G is defined in (11.191) . 
Moreover, the functionals {G e(t } fc6N satisfy the compactness property, with respect to the L 2 (D,p)- 
metric. That is, every sequence with Uk £ L 2 (D,p) for which 

sup \\u k \\mD, P ) < oo, sup G Ek (u n ) < oo, 
fee n fce N 

is precompact in L 2 {D , p). Finally, for every u £ L 2 (D, p) 

(1.26) lim G Ek (u) = a v G(u). 

n—too 


Proof. When p is constant, the proof may be found in the Appendix of [T] in case D is a convex 
set, and in |32l for a general domain D satisfying the assumptions in the statement. In case p is 
not constant the results are obtained in a straightforward way by adapting the arguments presented 
in [32] just as it is done in Section 4 in [15] when studying the variational limit of the non-local 
functional 

TV e {u):=— / rj e (x-y)\u(x)-u(y)\p(x)p(y)dxdy, 

£ n J D J D 

which is the L 1 analogue of G e . □ 


As observed earlier, the argument of G n ^ n is a function u n supported on the data points, while 
the argument of G En is an L 2 (D) function. For a function u n defined on the set V = {xi,... ,x n }, 
the idea is to associate an L 2 (D) function u n which approximates u n in the TL 2 -sense and is 
such that G En (u n ) is comparable to G nt£n (u n ). The purpose of doing this is to use Proposition 
11.101 We construct the approximating function u n by using transportation maps (i.e. measure 
preserving maps) between the measure v and u n . More precisely, we set u n = u n o T n where T n is 
a transportation map between v and v n which moves mass as little as possible. The estimates on 
how far the mass needs to be moved were known in the literature when p is constant and when the 
domain D is the unit cube (0, l) d (see [251 HU HU Ed] for d = 2 and [3S] for d > 3). In [TS] these 
estimates are extended to general domains D and densities p satisfying (11.151) . Indeed, the following 
is proved. 


Proposition 1.11. Let D C R d be a bounded, connected, open set with Lipschitz boundary. Let 
v be a probability measure on D with density p : H —> (0, oo) satisfying (11.151) . Let x\,... , x n ,... 
be i.i.d. samples from v. Let v n be the empirical measure associated to the n data points. Then, 
for any fixed a > 2, except on a set with probability 0(n ~ a / 2 ), there exists a transportation map 
T n : D —>■ D between the measure v and the measure v n (denoted T n ^v = v n ) such that 


II T n - Id |U < C 


In(n ) 3 / 4 

n >/2 > 

ln(ra) 1/d 

n 1 /’ 


where C depends only on a, D, and the constants m,M. 


if d = 2, 
if d > 3, 
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From the previous result, Chebyshev’s inequality and Borel-Cantelli lemma one obtains the fol¬ 
lowing rate of convergence of the oo -transportation distance between the empirical measures u n and 
the measure v (see ilfij for details associated to Proposition II.111) . 

Proposition 1.12. Let D be an open, connected and bounded subset of WL d which has Lipschitz 
boundary. Let v be a probability measure on D with density p satisfying (11.151) . Let Xi,..., x n , ... 
be a sequence of independent samples from v and let v n be the associated empirical measures (11.141) . 
Then, there is a constant C > 0 such that with probability one, there exists a sequence of transporta¬ 
tion maps {T n } n6N f rom v t° v n (T n $v = u n ) and such that: 

o o- 7 \ f u o u. v n 1/ ' 2 \\Id-T n \\ 00 

(L27) tf d = 2 then l.msup (logn)3/4 < C 

(1.28) and if d > 3 then limsup--|- 1 1 < C. 

n—too (iOg 71 j ' 

As shown in Section [3] Proposition 11.101 and Proposition 11.121 are at the backbone of Theorem 
11.41 Schematically, 

G e a v G in L 2 + Proposition II. 121 ==>■ G„* —i-A a^G in TL 2 . 

r 

We note that the statement G e — > a v G is a purely analytic, purely deterministic fact. Proposition 
11.121 on the other hand contains all the probabilistic estimates needed to establish all the results on 
this paper. Such estimates in particular provide the constraints on the parameter e n in Theorem ll.4l 
It is worth observing that Proposition ll.l2l is a statement that only involves the underlying measure 
v and the empirical measure v n , and that in particular it does not involve estimates on the difference 
between the functional G En ( u ) and the functional G Hj£n ( u ) for u belonging to a small (in the sense 
of EG-dimension) class of functions. In other words our estimates are related to the domains where 
the functions are defined (discrete/continuous) and not to the actual values of functions defined on 
those domains. 


With Theorem 11.41 at hand, the proof of Theorem 11.21 now relies on some spectral properties of 
the operator C and analogous properties of C n ,s n - As shown in Section [2~T1 the space L 2 {D,p) has 
a countable orthonormal basis (with respect to the inner product (•, •) p ) formed with eigenfunctions 
of C. Additionally, the different eigenvalues of C can be organized as an increasing sequence of 
positive numbers converging to infinity. Each of the eigenvalues has finite multiplicity. Moreover, 
the eigenvalues of C have a variational characterization, as they can be written as the minimum 
value of optimization problems over successive subspaces of L 2 {D,p). This is the content of the 
Courant-Fisher mini-max principle which states that for every k 

(1.29) Afc = sup min G(u ), 

seEfc_i IMIp =1 . wesv 


where we recall 0 = Ai < A 2 < ..., denote the eigenvalues of C repeated according to multiplicity, 
£fc_i denotes the set of (k — l)-dimensional subspaces of L 2 (D,p), and where S' 1 * represents the 
orthogonal complement of S with respect to the inner product (-,-)p. Moreover, the supremum in 
(I1.29P is attained by the span of the first (k — 1) eigenfunctions of C. In Subsection 12.41 we review 
the previously mentioned spectral properties of C. Likewise, we can write the eigenvalues of C n , en 
as 


(1.30) 


Ai n) = 


net, 


sup 


see 


(n) 


mm 

,=1,1 


E S- 1 


Gn,e n (u), 


where denotes the set of (k — l)-dimensional subspaces of R”, and where S 1 - represents the 
orthogonal complement of S with respect to the inner product in R". Moreover, as in the 
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continuum setting, the supremum in (11.301) is attained by the span of the first (fc — 1) eigenvectors 
of £ n ,e„ - Theorem 11.41 allows us to exploit expressions (11.301) and (11.291) and in fact in Section [3] 
we show how 11.301 and 11.291 together with Theorem 11.41 imply Theorem 11.21 thus establishing the 
spectral convergence in the unnormalized case. 

In the normalized case, the same approach used in the unnormalized case can be taken. In fact, 
the proof of Theorem 11.51 follows from the proof of Theorem ll.2l bv mutatis mutandis after Theorem 
1 1.61 has been proved. 

The paper is organized as follows. Section [2] contains the notation and the background we need 
in the rest of the paper. In particular in Subsection 12. 1 1 we review some facts about the TL 2 space, 
in Subsection 12.21 we review the definition of T-convergence, in Subsection 12.31 we present some 
results on stability of fc-means clustering, and in Subsection 12.41 some facts about the spectrum of 
the operators C, J\f s y m and Af rw . In Section [3] we prove Theorem 11.41 and Theorem 1 1.2 1 Finally, in 
Section [4] we prove Theorem 1 1.61 Theorem 11.51 and Corollary 11.71 


2. Preliminaries 


2.1. Transportation theory and the TL 2 space. Let D be an open domain in R . We denote 
by 5K-D) the Borel a-algebra of D , by V(D) the set of all Borel probability measures on D and by 
V 2 (D) the Borel probability measures on D with finite second moments. The Wasserstein distance 
between p, p £ V 2 (D) (denoted by d 2 (p,p)) is defined by: 

(2.1) d 2 {p,p) :=min|^y \x - y\ 2 dn(x, y^j : tt e T(p, /x)j , 

where T(p,p) is the set of all couplings between p and /l, that is, the set of all Borel probability 
measures on D x D for which the marginal on the first variable is p and the marginal on the second 
variable is p. The elements n £ T(p,p) are also referred as transportation plans between p and 
p. The existence of minimizers, which justifies the definition above, is straightforward to show, see 
|46| . It is known that the convergence in Wasserstein metric is equivalent to weak convergence of 
probability measures and uniform integrability of second moments. 

In the remainder, unless otherwise stated, we assume that D is a bounded set. In that setting, 
we have V{D) = V 2 (D) and uniform integrability of second moments is immediate. In particular, 
convergence in the Wasserstein metric is equivalent to weak convergence of measures. For details 
see for instance [46], [2] and the references within. In particular, p r —*■ p (to be read p n converges 
weakly to p) if and only if there is a sequence of transportation plans between p n and p 1 { 7 r n } ngN , 
for which: 


lim // \x-y\ 2 dn n (x,y)=0. 

n ^°°J JDxD 


( 2 . 2 ) 

J JDxD 

Actually, note that if D is bounded, (12.2|) is equivalent to lim„_ ) , 00 ff DxD \x — y\dn n (x,y) = 0. 
We say that a sequence of transportation plans, {7r n } n6N (with 7 r„ £ T(p,p n )), is stagnating if it 
satisfies (12.21) . Given a Borel map T : D —> D and p £ V(D), the push-forward of p by T, denoted 
by T$p £ V{D) is given by: 

T t p(A) :=p(T-\A)),A£<S(D). 

For any bounded Borel function ip : D —> R the following change of variables in the integral holds: 

(2.3) f ip{x) d(T i p)(x) = [ (p(T(x))dp(x). 

Jd J d 

We say that a Borel map T : D —> D is a transportation map between the measures p £ V{D) 
and p £ 'P(D) if p = T$p. In this case, we associate a transportation plan ttt £ T(/x, p) to T by: 

(2.4) T T :=(Idx7V, 
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where (Id xT) : D —> D x D is given by (Id xT)(x) = ( x,T(x )). 

It is well known that when the measure y £ V 2 (D) is absolutely continuous with respect to the 
Lebesgue measure, the problem on the right hand side of m is equivalent to: 

1/2 

(2.5) 


'D 


\x-T(x)\ 2 dy(x)) : Xjj/r = y 


In fact, the problem (12.11) has a unique solution which is induced (via (12.4[) 1 by a transportation 
map T solving (12.51) (see [46]). In particular, boundedness of D implies that when y has a density, 
then y n —^ y as n —> 00 is equivalent to the existence of a sequence of transportation maps, 

(T n jjp, = y n ) such that: 


( 2 . 6 ) 


/ \x — T n (x)\ 2 dn(x) 0, as n —> 00 . 
JD 


We say that a sequence of transportation maps {T n } ngN is stagnating if it satisfies & 

We now introduce the space of objects that allows to simultaneously consider the discrete and 
continuum setting. Let 

TL 2 (D) := {(/x, /) : /x £ V 2 {D), f £ L 2 (y)}, 

where L 2 (y) denotes the space of L 2 functions with respect to measure /!. For (/x, /), {v,g) in TL 2 
define 

(2.7) 


d T L^{(dJ),{^g)) = inf 

ner(n,v) 


\x - y | 2 + \f{x) - g(y)\ 2 dTr{x,y) 


DxD 


The set TL 2 and d^L 2 were introduced in |15j . where it was also proved that d^L 2 is a metric. Note 
that if we delete the second term on the right hand side of (12.71) we recover the Wasserstein distance 
between the measures /x and u. The idea of introducing the second term on the right hand side of 
m is to make it possible to compare functions in spaces as different as point clouds and continuous 
domains. We have the following characterization of convergence in TL 2 . See [T77] [Propositions 3.3 
and 3.12] for its proof. 


Proposition 2.1. Let (/x, /) £ TL 2 and let {(/x n , / n )} ngN be a sequence in TL 2 . The following 
statements are equivalent: 

TL , 2 

1- {y-nifn) - > (At, /) as n ->• 00 . 

2. The graphs of functions considered as measures converge in the Wasserstein sense m, 
that is 

(I X fn)thn (/ x /)jt/x as n ->• 00 . 

3. p n —*■ p and for every stagnating sequence of transportation plans {7r n } ngN (with Tr n £ 

r(tq A In)) 


( 2 . 8 ) 


DxD 


I f(x) - fn{y)\ 2 dTT n (x, y) -A 0, asn -A 00 . 


4. /x ra —*■ p and there exists a stagnating sequence of transportation plans {7r n } ragN (with Tr n £ 
T(/x, p n )) for which (12.81) holds. 

Moreover, if the measure y is absolutely continuous with respect to the Lebesgue measure, the fol¬ 
lowing are equivalent to the previous statements: 

4. y n —^ y and there exists a stagnating sequence of transportation maps {Tnl^gN ( w dh T n ^y = 
y n ) such that: 


ID 


I f(x) - f n (T n (x))\ 2 dy{x) —A 0, asn -> 00 . 


(2.9) 
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5. /j, n —*■ n and for any stagnating sequence of transportation maps (with T n ^pi = pi n ) 

(12.91) holds. 

Remark 2.2. One can think of the convergence in TL 2 as a generalization of weak convergence of 
measures and convergence in L 2 of functions. By this we mean that {/x n } ngN in 'P(D) converges 

TL 2 

weakly (and in the Wasserstein sense) to /i € V(D) if and only if (pi n , 1) — > (/x, 1) as n —> oo, and 

rp 

that for pi € V(D) a sequence {/n} ngN in L 2 (pi) converges in L 2 (pi) to / if and only if (pi, /„) —> 
(pi , f) as n —> oo. The last fact is established in Proposition 12.II 

Definition 2.3. Suppose {pi n } n eN in "P(D) converges weakly to pi £ V(D). We say that the sequence 
{wn} rae N (with u n G L 2 (pi n )) converges in the TL 2 -sense to u £ L 2 (pi), if {(pi n , u n )} net | converges 

to (pi,u) in the TL 2 -metric. In this case we use a slight abuse of notation and write u n —> u 
as n —► oo. Also, we say the sequence {'Un} ngN (with u n € L 2 (pi n )) is precompact in TL 2 if the 
sequence {(pin, w n )} ngN is precompact in TL 2 . 

Remark 2.4. Thanks to Proposition ^. 1 1 when pi is absolutely continuous with respect to the Lebesgue 
TL 2 

measure, u n —► u as n —> oo if and only if for every (or one) {T n } n6N stagnating sequence of 

Tj 2 

transportation maps (with T n $pi = pi n ) it is true that u n o T n — > u as n —> oo. Also {ti„} neN is 
precompact in TL 2 if and only if for every (or one) {T„} ngN stagnating sequence of transportation 
maps (with T n $pi = pi n ) it is true that {u n o T n } ngN is pre-compact in L 2 (pi). 

Lemma 2.5. Let pi n be a sequence of Borel probability measures on M. N with finite second moments, 
converging to a probability measure pi in the Wasserstein sense. Let A n be pi n measurable, and A be 
pi measurable. Let pi n = pi n ^A n and pi = pi^A- Then, 

TL 2 

(2.10) (fJrn,XA n )— >(pi,xa) if and only if pi n pi 

as n -A- oo. 

i_^ T L 2 

Proof. From Proposition 12.11 follows that ( pi n , XA n ) —> (pi, Xa) if and only if pi n x + (pi n — pi n ) x 

5{ 0 } pi x <5{i} + (pi - pi) x <5{ 0 }, as n -A oo. 

Since convergence in Wasserstein distance implies weak convergence, we deduce that pi n x <^{i} + 
(pi n — pi n ) x (5{o} -'■jix 5{i} + (pi — pi) x 5{ 0 }, and in particular we conclude that 

pi n — 11 pi, asn-> oo. 

Conversely, the weak convergence pi n —<■ pi, together with the fact that pi n pi (which in 
particular implies that pi n pi), imply that 

pin X <5{1} + (pin - pin) X <5{ 0 } A x <5{i} + (pi - fl) X <5{ 0 }. 

In order to conclude that the above convergence also holds in the Wasserstein sense, we simply 
note that this follows from the the uniform integrability of the second moments of {/AilnGN’ which 
in turn follows from 


lim sup / | x | 2 dfln(x) < lim sup / \x\ 2 dpi n (x) = 0. 

*—*°° n£N J{\x\>t} 4_) ' 00 nGNJ{|x|>t} 

The equality in the previous expression follows from the fact that pi n pi. 

The following proposition states that inner products are continuous with respect to the TL 
convergence. 


□ 

2 
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XL 2 TL 2 

Proposition 2.6. Suppose that (y n ,u n ) — y (//, u) and {y n ,v n ) —> (/x, v) as n —y oo. Then, 
(2.11) lim (u n , v n ) Mn = (u, v ) M . 


XL 

Proof. By the polarization identity, it is enough to prove that if (/*„,«„) —y (/z, it) then. 


( 2 . 12 ) 


lim IKIU = ||«|| M . 


For this purpose, consider a stagnating sequence of transportation plans {7r n } ngN with 7r ra £ T(y, y n )- 


We can write ||Mn||/x„ = (j DxD \u n (y)\ 2 dir n (x,yfj and ||u|| M = (f DxD \u(x)\ 2 dTT n (x, y)j 
Hence, 


1/2 


(2.13) 


IKIU ^ IMU ^ 


| u n (y) - u(x)\ 2 dn n (x,y) 


'DxD 


1/2 


0 , as n —> oo. 


□ 


In proving the convergence of k- means clustering (statement 4. in Theorems 11.21 and 11.51) 1 we 
also need the following result on TL 2 convergence of a composition of functions. 

Lemma 2.7 (Continuity of composition in TL 2 ). Let {/iri} n6 N and y he a collection of Borel 
probability measures on R rf with finite second moments. Let F n £ L 2 (y n , R d : R fc ) for all n £ N and 
F £ L 2 (y,M. d : R fe ). Consider the measures y n := F n ^y n for all n £ N and y := F$y. Finally, let 
f n £ L 2 (y n ,M. k : R) for all n £ N and f £ L 2 (y, R fe : R). If 



(Mn i -^n) 

TL* 

(fF) 

as 

n —y oo, 

and 







(fin-) fn) 

TL 2 

(pj) 

as 

n —y oo. 

Then, 







■R 

s 

K 

o 

£ 

TL* 

(fI° 

F n ) 

as n —y oo 


Proof. First of all note that the fact that F n £ L 2 {y r 


and F £ L 2 (y, 1 


guarantees 


that y n and y are probability measures on R te with finite second moments. On the other hand, 
XL 2 

(y n , F n ) —y (y, F) as n —> oo implies the existence of a stagnating sequence of transportation maps 
{UneN with 7r n £ T(y,y n ) such that 


(2.14) 


lim 

n—> oo 


R d xR d 


I F(x) - F n {y)\ 2 dn n (x,y) = 0. 


We consider the measures 7f ra := (F x F n )pr n for all n £ N. It is straightforward to check that 
7r n £ T{y, fl n ) for all n £ N and by the definition of n n that 

lim / \x — y\ 2 dn n (x, y) = lim / \F(x) - F n (y)\ 2 dn n (x, y) = 0 

n ^°° JR d xR d 

In other words, {7fn} ng ^ is a stagnating sequence of transportation maps with h n £ T(y, y n ). From 

~ XL 2 ~ _ 

the fact that (y n , fn) —> (P, /) as n —> oo and from Proposition 12. II it follows that 

I f(x) - f n (y)\ 2 dn n (x,y) = 0. 


lim 

n—>oo 


R k xR k 
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But again by the definition of w n we deduce 

lim [ \f(F(x)) — fn{F n (y))\ 2 d,Tr n (x,y) = lim [ \f(x) - f n (y)\ 2 dTr n (x, y) = 0 

n “ > 00 ii' l xl ‘ 1 n^-oo J R k xR k 

Using again Proposition l2.il we obtain the desired result. □ 

2.2. T- convergence. We recall the notion of T-convergence in general setting. 

Definition 2.8. Let (X,dx) be a metric space and let (f2, F, P) be a probability space. Let {i 7 ’ n } n6N 
be a sequence of (random) functionals F n : X x LI —> [0, oo] and let F be a (deterministic) functional 
F : X —> [0, oo]. We say that the sequence of functionals {^n} n6 N T-converges (in the dx metric) 
to F, if for almost every u> £ LI, all of the following conditions hold: 

(1) Liminf inequality: For all x £ X and all sequences {x n } ra6N converging to x in the metric 
dx it is true that 

liminf F n (x) > F(x) 

n—»■ oo 

(2) Limsup inequality: For all x £ X there exists a sequence {^n} n6 jsj converging to x in the 
metric dx such that 

lim sup F n (x) < F(x) 

n—>oo 

The notion of T-convergence is particularly useful when combined with an appropriate notion of 
compactness. See El El- 

Definition 2.9. We say that the sequence of nonnegative random functionals satisfies the 

compactness property if for almost every uj £ LI, it is true that every bounded (with respect to dx) 
sequence in X for which 

sup F n (x) < oo, 

n£N 

is precompact in X. 

Now that we have defined the TL 2 - space, and we have defined the notion of T-convergence, we 
can rephrase the content of Theorem 11.41 in the following way. Under the conditions on the domain 
D , the density p and the parameter e n in Theorem 11.41 with probability one, all of the following 
statements hold: 

(1) Liminf inequality: For all u £ L 2 (u), and all sequences with u n £ L 2 {y n ) and 

TL 2 

with u n —> u it is true that 

liminf G nen (u„) > a v G{u). 

n—> oo 

(2) Limsup inequality: For all u £ L 2 (u), there exists a sequence {«„} neN with u n £ L 2 (v n ) 

TL? 

and with u n — > u for which 

lim supG„ j£n (u„) < a v G(u). 

n—> oo 

(3) Compactness: Every sequence {«„} n£N with u n £ L 2 (u„), satisfying 

sup F n (x) < oo, 

ngS 

is precompact in TL 2 , that is, every subsequence of {u„} ragN has a further subsequence, 
which converges in the TL 2 - sense to an element of L 2 {D). 

In a similar fashion we can rephrase the content of Theorem 11.61 
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2.3. Stability of fc-means clustering. Here we prove some basic facts about the functional F^ k 
defined in (11.241) and about Thcorem ll.81 Our first observation is that there exist minimizers of F^k- 
We note that F^^ is a continuous function which is non-negative and the existence of minimizers 
can be obtained from a straightforward application of the direct method of the calculus of variations 
as we now illustrate. If the support of p has k or fewer points, then including these points in z 
provides a minimizer for which F( z) = 0. On the other hand, if the support of /j has more than k 
points then to show that a minimizer exists it is enough to obtain pre-compactness of a minimizing 
sequence due to the continuity of F^ tk - Let {z m } mgN be a minimizing sequence of F /lk . 

By considering a subsequence we can assume that for any i = 1 ,,k, z™ either converges to 
some Zi £ R w or diverges to ±oo. Also without the loss of generality we can assume that for some 
1 < l < k + 1, the sequence converges for i < l and diverges for i> l. Our goal is to show 

that l = k + 1. Assume for the sake of contradiction that l < k. First note that if l = 1 (when no 
subsequence converges) then F Mj / c (z m ) — > oo as m —> oo, which is impossible. So we can assume 
that z™ converges to z\ as m —> oo. It is straightforward to show using the finiteness of the second 
moment of /i, that 


(2.15) 


F^, k (z m ) -t i^.i—i({zi,...,z*—i}), as m -I oo. 


However unless l = &+1, adding k—(l — 1) points from supp(p,)\{zi,..., Zi-±} to {z \,..., Zi- 1 } would 
result on a value of F ^ that is strictly below -F Mj i-i({£i,... ,zi-±}) and from (12.151) . this would 
contradict the assumption that {z m } mgN is a minimizing sequence. We conclude that {z m } mgN 
converges up to subsequence. 

We now turn to comparing the properties of F /hk for different measures /j, the ultimate goal is to 
prove Theorem 11.81 Let p and v be Borel probability measures on 1 N with finite second moments 
and let 7r £ T(/x, v) be the optimal transportation plan realizing the Wasserstein distance between 
p and iq that is, assume 


d\(p, v) 


\x - y\ 2 dn(x,y). 


Then 


\F^ k (z) - F Vyk ( z)| = 


'R w 


d(x,z) 2 dp(x)~ d{y,z) 2 diz(y) 
Jr n 


// (d(x,z) 2 - d(y,z) 2 ) dTr(x,y) 

JJr n xR w 

// \d{x,z) 2 - d(y 1 z) 2 \dn(x,y) 

J Jr n xr n 

// {\x~y\+ d(y , z)) 2 - d{y, z) 2 dir(x, y) 

J Jr n xR n 

< dj(p, v) + 2 d 2 (p, v)\J F v ^ k { z), 


< 


< 


where the last inequality is obtained after expanding the integrand and using Cauchy-Schwartz 
inequality. By symmetry, we conclude that 

(2.16) \F^ k {z) - F„, fc (z)| <d 2 (p,v) ^2 min j \JF^^ (z), ^F Vtk (z) | + d 2 {p, . 

We also need the following Lemma. 

Lemma 2.10. Let p be a Borel probability measure on with finite second moment and at least k 
points in its support. Let z = (zi,..., z k ) be a minimizer of the functional Denote by V \,..., 14 
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the (closed) Voronoi cells induced by the points Zi,..., z k , i.e. , V, 
Then, 

p(v i nv j ) = o, Vi ± j. 


{cc G : \x — 


d{x, z)}. 


Proof. We start by recalling that if k = 1 then the minimizer z± of F fl< i is the centroid of p, that 
is Zi = J xdp(x). We now consider k > 2. Since the support of p has at least k points, the points 
z\,...,Zk are distinct. Assume that p(Vi fl Vj) > 0 for some i ^ j. Note that the set Vi fl Vj is 
contained in the plane P z j with normal vector Zi — Zj, and containing the point \z t + \zj. Let Jt i = 
pey, and 8i = p-Pi- Let z = (zi, ..., 2»-i,2»+i,..., z n ). Note that F^ k { z) = Fp.p(zi) + F Si>k _ 1 (z) . 
Consequently Zi minimizes Fp.p and by remark above, z% is the centroid of , that is 

/ xd Pi( x )- 

Pi{“^ ) JWL k 

Now, let p. = p^ Vi \ Vj and = p - p.. Analogously to above F^ k (z) = F^ 1 (z i ) + Fg.^- i(z). 
Hence 2 j minimizes p and thus is the centroid of p , i.e., 

2* = /l fc x [ xdpix). 

/£i( R ) V 


But note that D Ay) > 0 implies that 


1 


hi 



Zi)dpi(x) > 




Zi)dp.(x). 


This, contradicts the fact that the centroids of p. and p t are both equal to 2 j. 


□ 


The proof of Theorem 11.81 is now a direct consequence of (12.1611 and Lemma r2.10l 


Proof of Theorem, I J. 81 Let a fe = i r Alj fe( z fc ) and where z fc is a minimizer of F^^ and 

where zjjj is a minimizer of F^ mt k- Note that since the support of p has at least k points, ai > a k for 
all l < k. From (12.161) it follows that —> a k as m —> oo for any k G N. To show that {z^, m G N} 
is precompact it is enough to show that all coordinates are uniformly bounded. If this is not the 
case then there exists 1 < l < k such that coordinates 1 to l converge, while those between l + 1 
and k diverge to ±oo. Arguing as in the proof of the existence of minimizers at the beginning of 
this Section, and using (12.161) . one obtains that ^^(z*) converges to F^i({zi ,..., zi}) for some 
z\,... , zi G M. N . If l < k, then this would imply that ai < F^i{{z \,..., zi}) = linim-xx, = a k , 
which would contradict the fact that a; > a k - Thus concluding that along a subsequence z^ — > z k 
for some z k . To show that z k minimizes F^_ k simply observe that from (12.161) and the continuity of 
it follows that 

F^ k (z k ) = lim F^kiz k m )= lim F Mmjfc (z^) = lim a k m = a k , 

m—too n—too m—> oo 

which implies that indeed z k minimizes F 

Finally, the last part of the Theorem on convergence of clusters, follows from the fact that p m 
converge weakly to p , that their second moments are uniformly bounded, and that the boundaries 
of Voronoi cells change continuously when the centers are perturbed. □ 


2.4. The Spectra of C , J\f sym and Af rw . The purpose of this section is to present some facts about 
the spectra of the operators £, M sym , and M rw . These facts are standard (see p3J, or Chapter 8 
in i)- We present them for the convenience of the reader. 
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Let D be an open, bounded, connected domain with Lipschitz boundary and let p : D —> R be a 
continuous density function satisfying (11.1511 . For a given w £ L 2 (D) we consider the PDE 


(2.17) 


C(u) = w in D, 
du 


<9n 


= 0 on dD, 


where we recall C is formally defined as C(u) = — idiv(p 2 Vu). We say that u £ H X {D) is a weak 
solution of (12.171) if 


(2.18) 


/ Vu ■ X7vp 2 (x)dx = / vwp{x)dx , Vu £ i7 1 (U). 
J d J D 


Remark 2.11. Note that if it is a solution of (12.171) in the classical sense, then integration by parts 
shows that it is a weak solution of (12.171) . 

A necessary condition for (12.171) to have a solution in the weak sense, is that w belongs to the 
space 

U := <|ii> £ L 2 (D) : J wp{x)dx = 01 . 

This can be deduced by considering the test function v = 1 in (12.181) . We consider the space 

V := £ 7J 1 (I?) : J vp(x)dx = 01 , 

& given by 

(it, v) := / Vn • Vvp 2 dx. 

J D 

One can use the assumptions on p in (11.151) . and Poincare’s inequality (see Theorem 12.23 in [271). 
to show that a is coercive with respect to the H 1 inner product on V, defined by 


and consider the bilinear form a : V x V 
(2.19) 


(it, v)h 1 (d) := / uvdx + / S/u-S/vdx. 
J D J D 


In addition, a is continuous and symmetric. 

Therefore by Lax-Milgram theorem jT3][Sec. 6.2] for any w £ U there exists a unique solution 
it £ V to (12.171) . From (12.181) and the assumption (11.151) on p, it follows that 

(2.20) f \X7u\ 2 p 2 (x)dx < C f \w\ 2 p{x)dx, 

J D J D 

for a constant C. We can then define the inverse £” 1 : U —> V of £, by letting C~ l : w i-A it, where 
u is the unique solution of (12.171) . From (12.201) . it follows that £ _1 is a continuous linear function. 
Rcllich-Kondrachov theorem (see Theorem 11.10 in [27]) implies that C~ l is compact. 

We say that A £ R is an eigenvalue of the operator £, if there exists a nontrivial it £ i7 1 (H) 
which is a weak solution of (11.111) . That is if 

(2.21) a(u,v) = / Vit • Vvp 2 (x)dx = A / uvp(x)dx = X(u,v) p , Vu £ i7 1 (£>). 

J D J D 

Such function it is called an eigenfunction. 


Remark 2.12. We remark that Ai = 0 is an eigenvalue of C and that the function u\ identically 
equal to one is an eigenfunction associated to Ai. Given that D is connected, it follows that the 
eigenspace associated to Ai = 0 is the space of constant functions on D. We also remark that U is 
by definition the orthogonal complement (with respect to the inner product (-,-)p) of Spanjui}. 
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Using the definition of £, the definition of weak solutions to (12.171) it follows that 

(2.22) u is an eigenfunction of £ with eigenvalue A ^ 0 iff £~~ 1 (u) = —u. 

A 

In other words the non-constant eigenfunctions of £ are the eigenfunctions of £ , and the nonzero 

eigenvalues of £ are the reciprocals of the eigenvalues of £. Thus, by understanding the structure 
of the spectrum of £~ , one can obtain properties of the spectrum of £. 

Proposition 2.13. The operator Hr 1 : V —> V is a selfadjoint, positive semidefinite (with respect 
to the inner product a(-,-)) and compact. The eigenvalues of Hr 1 can be arranged as a decreasing 
sequence of positive numbers, 

Xf 1 > Ag 1 > ... 

repeated according to (finite) multiplicity and converging to zero. Moreover, there exists an orthonor¬ 
mal basis {vk } k> 2 o/V, where each of the functions v k is an eigenfunction of £ _1 with corresponding 
eigenvalue Xf 1 . 

Proof. In order to show that £ _1 : V — > V is self-adjoint with respect to a(-, •), take vi,v 2 G V and 
let m = C,~ x Vi for * = 1,2. We claim that 

a(£~ 1 v 1 ,v 2 ) = (vi ,v 2 ) p . 

In fact, from the definition of Hr 1 it follows that 

a(£.~ 1 v 1 ,v 2 ) = a(ui,v 2 ) = / Vi*i ■S7v 2 p 2 (x)dx = / viv 2 p(x)dx = (vi,v 2 ) p 

J d Jd 

From the previous identity, it immediately follows that Hr 1 is self-adjoint and positive semidefinite 
with respect to the inner product a(-, •). The compactness of £ -1 follows from Rellich-Kondrachov 
theorem (see Theorem 11.10 in m)- The statements about the spectrum of £ 1 are a direct 
consequence of Riesz-Schauder theorem and Hilbert-Schmidt theorem (see [33]). □ 


For fc > 2, let Vk be eigenfunctions as in the previous proposition and define Uk by 


(2.23) 



We claim that {uk} k>2 is an orthonormal base of U with respect to (•, -) p . In fact, it follows from 
the definition of C~ x and (!2.28[) that 


Ski = a(v k ,vi) = A k{v k ,vi) p = Uk,ui,) 

V A i 


I pi 


where Ski = 1 if k = l and 5u = 0 if k ^ l. Hence ( Uk,ui,) p = S k i- In other words {uk} k > 2 an 
orthonormal set. Completeness follows from the completeness in Proposition 12.131 and density of 


H X (D) in L 2 {D). 

By setting u\ = 1 and by noticing that L 2 (D) = Span{iti} ®U, we conclude that {wfc} fcgN is a 
orthonormal base for L 2 (D) with inner product (•, ■) p . The next proposition is a direct consequence 
of the previous discussion and (12.221) . 


Proposition 2.14. C has a countable family of eigenvalues {Afc} fcgN which can be written as an 
increasing sequence of nonnegative numbers which tends to infinity as k goes to infinity, that is, 

0 = Ai < A 2 < • • • < Afc < ... 

Each eigenvalue, is repeated according to (finite) multiplicity. Moreover, there exists {rifc} fcgN an 
orthonormal basis (with respect to (•, ■) p ) of L 2 (D), such that for every k £ N, u k is an eigenfunction 
of £ associated to X k . 


Finally we present the Courant-Fisher maxmini principle. 
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Proposition 2.15. Consider an orthonormal base {wfc} fe6N for L 2 (D) with respect to the inner 
product where for each k £ N, Uk is an eigenfunction of £ with eigenvalue A k- Then, for 

every k £ N 

(2.24) A/c = min G(u), 

||u|| p =i, ues*^ 

where S* = Spanjwi,... ,Uk- 1 } and where S* denotes the orthogonal complement of S* with respect 
to the inner product {-,-) p . Additionally, 

(2.25) A k = sup min G(u), 

SeE fc _! IHI„=i, ues-i- 

where £k-i denotes the set of (k — 1 )-dimensional subspaces of L 2 (D), and where S^ represents the 
orthogonal complement of S with respect to the inner product (■,-) p . 

The proof (of a similar statement) can be found in Chapter 8.3 in [5]. 

Remark 2.16. If the density p is smooth, then the eigenfunctions of £ are smooth inside D. 


We now turn to the spectrum of We say that r £ K is an eigenvalue of the operator 

J\f sym , if there exists a nontrivial u £ which solves (11.131) . That is if 

(2.26) J d V ' V P 2 (x)dx = tJ uvp(x)dx, Vv £ H^D). 

The function u is then called an eigenfunction of J\[ sym with eigenvalue r. 


Remark 2.17. We remark that t\ = 0 is an eigenvalue of J\f sym and that the function u i equal to 


ui(x) = 




\Wp\\ p 

is an eigenfunction of J\f sym , with eigenvalue T\ = 0. Given that D is connected, it actually follows 
that ri = 0 has multiplicity one and thus the eigenspace associated to t\ = 0 is the space of multiples 
of V /P- 

Following the same ideas used when considering the spectrum of £, we can establish the following 
analogous results. 


Proposition 2.18. A/" sym has a countable family of eigenvalues {Tfc} fcgN which can be written as 
an increasing sequence of nonnegative numbers which tends to infinity as k goes to infinity, that is, 

0 = n < t 2 < ■ ■ ■ < Tk < ■ •. 

Each eigenvalue, is repeated according to (finite) multiplicity. Moreover, there exists {UfclfcgN an 
orthonormal basis (with respect to (•, ■) p ) of L 2 (D), such that for every k £ N, Uk is an eigenfunction 
ofJ\f sym associated to Tk. 

Proposition 2.19. Consider a orthonormal base {ufc} fc6N for L 2 (D) with respect to the inner 
product (•, -)p, where for each k £ N, Uk is an eigenfunction of Af sym with eigenvalue Tk- Then, for 
every k £ N 

(2.27) Tk = min G(u), 

I|u||p = l , bES * 1 

where S* = Spanjui,... ,Uk- 1 }- Additionally, 

Tk = sup min G(u), 

seE m _i IMIp=i, 

where £ m -i denotes the set of (m — 1)-dimensional subspaces of L 2 (D), and where S _L represents 
the orthogonal complement of S with respect to the inner product (-,-)p. 
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Finally, we consider the spectrum of J\f rw . We say that r g 1 is an eigenvalue of the operator 
J\f rw , if there exists a nontrivial u £ H 1 (D) for which 

(2.28) / V u ■ V vp 2 (x)dx = r f uvp 2 (x)dx , Vv G H 1 (D). 

J D J D 

The function u is then called an eigenfunction of J\f rw with eigenvalue r. From the definition, it 
follows that r is an eigenvalue of J\f rw with eigenfunction u if and only if r is an eigenvalue of J\f sym 
with eigenvector w := yfpu. This is analogous to (11.41) in the discrete case. 


3. Convergence of the spectra of unnormalized graph Laplacians 


We start by establishing Theorem II.41 


Proof of Theorem As done in Section 5 in m and due to the assumptions (Kl) - (K3) on r/, 
we can reduce the problem to that of showing the result for the kernel 77 defined by 


fl, if * e [0,1], 
v(t) ■= < 

[0, if t > 1. 

We use the sequence of transportation maps {T„} N from Proposition ll.121 Let w £ fl be such that 
(11.271) and (11.281) hold in cases d = 2 and d > 3 respectively. By Proposition ll.ril the complement in 
fl of such uj’s is contained in a set of probability zero. The key idea in the proof is that the estimates 
of Proposition ll.l2l implv that the transportation happens on a length scale which is small compared 
to s n . By taking a kernel with slightly smaller radius than e n we can then obtain a lower bound, 
and by taking a slightly larger radius a matching upper bound on the functional G n ^ n . 

TL 1 

Liminf inequality: Assume that u n —> u as n —> 00 . Since T n $v = v n , using the change of 
variables (12.31) it follows that 

(3.1) G«, £n (««,) = f r) en {T n (x) - T n (y)) {u n o T n {x) - u n o T n (y)) 2 p(x)p(y)dxdy. 

£"n JDxD 

Note that for (x,y) G D x D 

(3.2) | T n {x) - T n (y)\ > e n => \x - y\ > e n - 2\\Id - TnW^. 

Thanks to the assumptions on {£n} n6 N ( <jl-27[) and (11.281) in cases d = 2 and d > 3 respectively), 
for large enough n £ N: 


(3.3) e n := e n - 2\\Id- Tnlloo > 0. 

By (13.21) . and our choice of kernel q, for large enough n and for every (x, y) £ D x D, we obtain 




- y I 


< v 


\T n (x) - T n {y)\ 


We now consider u n = u n o T n . Thanks to the previous inequality and (ED, for large enough n 

Gn,e n (u n ) > -^ry f V \ ~ J (u n {x) - u n (y)) 2 p{x)p(y)dxdy 

Sn J DxD \ S n J 


d +2 


Ge n (fin). 


TL 1 


Note that —> 1 as n —> 00 and that u n —> u by definition implies u 

We deduce from the liminf inequality of Proposition 11.101 that lim infn-^oo G, 
hence: 


L\Dp) 

—> u as ti —^ 00 . 
,(i n ) > a v G(u ) and 


liminf G n En ( u n) > er v G(u). 

n—> 00 
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Limsup inequality: By using a diagonal argument it is enough to establish the limsup inequality 
for a dense subset of L 2 (D) and in particular we consider the set of Lipschitz continuous functions 
u : D —y R. That is, we want to show that if u : D —> R is a Lipschitz continuous function, then 
there exists a sequence of functions {« n } n6 ni where u n £ L 2 {u n ) and 

XL 2 

u n —> u as n -> oo, limsupG n>En ( 7 t n ) < a v G(u). 

n—too 

We define u n to be the restriction of u to the first n data points Xi,... ,x n . We note that this 
operation is well defined due to the fact that u is in particular continuous. It is straightforward to 

XL 2 

show that given that u is Lipschitz we have u n —> u. 

Now, consider e n := e n + 2 ||Id — and let u n = u o T n . The choice of kernel 77 implies that 

for every (x,y) £ D x D 




\T n {x) — T n (y)\ 


< V 


\x - y | 


It follows that for all n £ N 


1 


(3.4) 


~dL +2 / 

E n JDxD 


V 


\T n (x)-T n (y)\ 


(u n (x) - u n (y)) 2 p(x)p(y)dxdy 


< 


“4 [ Ven ( x - y) (dn(x) - u n (y)) 2 p(x)p{y)dxdy. 
£ n Jdxd 


Now let A n and be given by 

An ■■= 4 r [ Pi n {x - y)(u(x) - u{y)) 2 p{x)p(y)dxdy 

£ n JDxD 

Bn ■■= -4 [ m n {x - y)(u n {x) - U n {y)) 2 p{x)p{y)dxdy. 

£ n JdxD 


Then, 


(3.5) 


— /—\ 2 1 f 2 

An - v Bn) <-z£ Vi n {x-y) ( u(x ) - u n (x) + u n (y) - u(y)) p(x)p(y)dxdy 

' e n JDxD 

<^2 Ve n ( x - y)(u(x) - u n (x)) 2 p(x)p(y)dxdy 

£n J DxD 

ACU V {u) 2 \\p\\l^ D) \\Id-T n \\{ 


I 2 

loo 


where the first inequality follows using Minkowski’s inequality, and where C = f Rd r](h)dh. The last 
term of the previous expression goes to 0 as n —> 00 , yielding 

lim |VC£- \[~Bn | = 0 . 


On the other hand, by (11.2611 it follows that A n is bounded on n and in particular it follows that 
(3.6) 

We conclude that 


lim | A n — B n \ =0. 

n—>■ 00 


lim sup G n , En (u n ) = lim sup / rj ( 
n—t 00 n—>• 00 En J DxD V 


I Tn(x) - T n (y) I 


(u n (x) - u n (y)) 2 p(x)p(y)dxdy 


< lim sup — / 

n->oo £ n J DxD 

= lim sup G En (u) = a v G(u ), 


Vi n (x - y){u n (x) - u n (y)) 2 p{x)p(y)dxdy 
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where the first equality is obtained from the fact that ^ —> 1 as n —> oo, the first inequality is 
obtained from (13.41) . the second equality is obtained from (13.611 and the last equality is obtained 
from (11.261) . 

Compactness: Finally, to see that the compactness statement holds suppose that {u n } ngN is a 
sequence with u n £ L 2 (v n ) and such that 

SUp \\u n \\ L 2^ n) < OO, SUp G n ,e n {u n ) < OO. 
nGN nGN 

Note that in particular sup ragN \\u n o T n \\L 2 (v) < oo. We want to show that 

sup G en (u n o T n ) < oo 

n£N 

To see this, note that for large enough n , we can set i n := e n — 2 ||Id — T n Hoc as in (13.31) . Thus, 
for large enough n: 

-JT 2 f *7 ( I {u n ° T n (z) -u n o T n {y)) 2 p(z) p(y)dzdy 

£ n JDxD \ £ n J 

^ 1 [ „ (I T n (z) — T n (y) | ^ rr ( w 2 t \ / \j j 

< -rfX2 / V --- {Un o T n {z) - Un o T n (y)) p(z)p{y)dzdy 

En JDxD \ £ n / 

Gn,e n ( U n )• 

Thus 

sup —rrr [ v( ) K o T n (z) - u n o T n (y)) 2 p(z)p{y)dzdy < oo. 

nGN S n JDxD \ / 

Finally noting that ^ -» 1 as n —»■ oo we deduce that: 

sup (Un o T n ) < oo. 

nGN 

By Proposition ! 1.1 01 we conclude that {u n o T n } ngN is relatively compact in L 2 (y) and hence {linlngf} 
is relatively compact in TL 2 . □ 

Now we prove Theorem 11.2! 

3.1. Convergence of Eigenvalues. First of all note that because £n iE „ is self-adjoint with respect 
to the Euclidean inner product in R n , in particular it is also self-adjoint with respect to the inner 
product (•, -) Vn and furthermore, it is positive semi-definite. In particular, we can use the Courant- 
Fisher maxmini principle to write the eigenvalues 0 = A^ < • • • < Xu'' 1 of C n , £n as 

A< n) = sup min (C nEn u,u) Vn , 

s ^ x k l -i ’ “ eS± 

where denotes the set of subspaces of of dimension k — 1. On the other hand, for any 

u n £ T 2 (u„), from m it follows that 

2 

(3.7) G n e {w-ri) = 9 Wn)i/ n 

ne z n 

Therefore, 

2 ^( n ) 

—= sup min G ne (u). 

ne 2 n Sg2 (») ||«lh n =i ,ues^ 

Let us first prove the first statement from Theorem 1 1.2 1 The proof is by induction on k. For k = 1, 
we know that A^ = 0 for every n. Also, Ai = 0, so trivially CD is true when k = 1. Now, suppose 
that Cl) is true for i = 1,..., k — 1. We want to prove that the result holds for k. 
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Step 1: In this first step we prove that <r v \k < liminf„_>. (X) . Let S £ £&_!, we let 
{ui,... ,Uk-i} be an orthonormal base for S. Then, for every i = — 1, there exists a 

sequence {w"} ra6N (with u™ £ L 2 {y n )) such that u" —> Ui as n —> oo. The existence of such 
sequence follows from the limsup inequality of Theorem 11.41 Proposition 12.61 implies that for all 
i = 1 ,..., k — 1 

lim IKIk = INIp = i, 

n—> oo 

and that for i ^ j 

(3.8) lim «, u]) Vn = (ui, Uj) p = 0. 


Thus, for large enough n, the space generated by {it",... is /c — 1 dimensional. We can use 

the Gram-Schmidt orthogonalization process, to obtain an orthonormal base {ft",..., That 


Ik, and recursively v? ■= u™ - 


is, we define m" := it"/ 
for i = 2 ,..., k — 1 . 

It follows from (13.81) and Proposition 12.61 that u" 
S n := Span {u”,..., u^_ 1 } ■ We claim that 


and 


tl 2 


Ui as n —> oo for every i = 1,..., k — 1. Let 


2 \( ra ) 

(3.9) lim inf —> min a n G(u ) 

n->oo nel || u || p =i,ues J - 


First, note that if 

lim inf min G nEn (u) = oo, 

||u||„ n =l, 

then in particular 

2 ^( n ) 

lim inf —k > lim inf min G n En (u) = oo, 

n-t oo nen-t oo || u ||„ n =l, uSS , 1 

and in that case (13.91) follows trivially. Let us now assume that lim inf„_>(*, min|| u || =1) u6Sn _L G UtEn ( u ) < 
oo. Working on a subsequence that we do not relabel, we can assume without the loss of generality 
that the liminf is actually a limit, that is, 


lim min G n En (w) = lim inf min G nSn (u)< oo. 

n^oo ||u||„„=l, ues„ x ’ n->oo ||ii||„„ =1, u€Sn ± 

Consider now a sequence {u n } n6N with ||u ra |k = 1 and Vn €E S n ± such that 
lim G n En [v n ) = lim min G nEn {u)< oo. 

n^foo ’ n-ioo ||t»||„ n = l, 

Using the compactness from Theorem II. 41 and working on a subsequence that we do not relabel, we 
may assume that 

TL 2 

(3.10) v n —> v, as n —> oo, 

for some v £ L 2 (D). From Proposition^!)! klip = lim„_ ! . 00 ||u n |k = 1 and ( v,Ui) p = lim„^. 00 (w n , u")„ n 

rj i J 2 

0 for every i = 1,..., k — 1. In particular, k|| p = 1 and v £ Sk Moreover, given that v n —> v, it 
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follows from the liminf inequality of Theorem 11.41 that 


min a v G(u) < a v G(v) 


< liminf G n ^ n (v n ) 

n—too 

= lim min G n Sn ( u ) 

n-¥oa ||u||=l, iieS„ x 

< lim inf sup min G n e (it) 

n ~>°° 5 eS ^) ||«||=i, ues- 1 - 


r • c 2X k 

= Inn inf — 


(n) 


n-¥ oo net 


Thus showing (13.91) in all cases. Finally, since S £ £fc_i was arbitrary, taking the supremum over 
all S £ £fc_i and using the Courant-Fisher maxmini principle we deduce that 


2 A(") 

cr„Afc < liminf — %r~. 
n-y oo ne 

Step 2: Now we prove that limsup n _ ) , 00 n * a < AConsider jit™,... an orthonormal set 

(with respect to (■,■)„„) with it™ an eigenvector of £„ >En associated to A- ra ^ (this is possible because 
£n,e„ is self-adjoint with respect to (•,•)„„). Consider then S* := Span {it™,..., . We have: 


2Xj n) 

ne 2 n 


sup min G nEn (u) = min G n£n (u). 
ll M lhn. = i> U&S- 1 - IMIkti = 1i wess ± 


Working along a subsequence that we do not relabel, we can assume without the loss of generality 
2 A^ Tl ) 2 A^ n ^ 

that limsupjj^^ = lim^-nx, n * 2 . Note that by the induction hypothesis, for every i = 

1 , ..., k — 1 we have: 

2 a(™) 

(3.11) lim G niEn (u”) = lim —= ^A* < °o. 

n—too n—> oo 77,£~ 

Thanks to this, we can use the compactness from Theorem 11.41 to conclude that for every i = 
1 ,..., k — 1 (working with a subsequence that we do not relabel) : 


as n —> oo, 


for some iq £ L 2 (D). From Proposition 12.61 ( Ui,Uj) p = lim ra _ ) . 00 (ii™, Uj)„ n = 0 for i ^ j and 
||Hi||p = limn^oo ||u”||i/ n = 1 for every i. Take S := Span{iq,..., itfe_i}, note that in particular 
S £ £fe-i- Also, take ileS 1 with ||u|| p = 1 and such that: 


(3.12) 


cr v G(v) = min cr„G(ii) < a v Xk- 

||u|| p =i, ueS 1 


The last inequality in the previous expression holds thanks to the Courant-Fisher maxmini principle. 

By the limsup inequality from Theorem 1 1.4[ we can find {u n } nGN with v n —> v as n oo and such 
that limsup n ^. OG G n , £n (v n ) < a v G{v). Let v n be given by 


k -1 

Vn •— V n ^ ^ * 









30 


NICOLAS GARCIA TRILLOS AND DEJAN SLEPCEV 


Note that 

Vn £ 5* x . 

Also note that from Proposition 12.61 we deduce that (v n , u" 

)v n 0 

for all i = 

: 1 , ■ • • ,fc - 

~ TZ/ 2 

1 and thus v n — > v as n —> 00 . Moreover, 



(3.13) 





Gn,En (^n) 

= 2 " (Gn 

V n ) 



nel 




- nel {Ln 

2 fc_1 

,,e n Vni V n ) — „ ^ ',{ v m u i ) v n {£-n,Cn V n, )„ n 

n i=l 

9 fc -1 

- 2 

n£ nU 

«?>*»<£ 


fc - 1 2 (n) 


k -1 


= G n ,M - X! 

i=1 ,Lt n rib n 2=1 

^ 2A- n) 


= G n , e „(« n )-X)— 


G n ^ Sn (v n ). 


Therefore, 

(3.14) 


limsup G„ )£n (i>„) < lim supG raiEre (i>„) < a v G{v). 


r J'J J 2 

Since u ra —» u and ||ti|| p = 1, once again from Proposition 12.61 we obtain linin^oo ||u n |U n = 1- In 
particular we can set u n := -pAjj— and use (13.1411 together with (13.121) to conclude that: 

2 ^(”) 

lim —A- = lim min G n en ( u ) 

n-Hx> ne* n-Hx> ||u||„ n =l, ues*- 1 - 

< lim supG„, £n (fi„) 

n—¥ OO 

= lim sup G„,e„ (Un) 

n—¥ oo 

< CT v G(v) 

which implies the desired result. 


3.2. Convergence of Eigenprojections. We prove the second and third part of Theorem 11.21 
We recall that the numbers Ai < A 2 < ... denote the distinct eigenvalues of C n>en . For a given 
fc £ N, we recall that s(fc) is the multiplicity of the eigenvalue A& and that fc £ N is such that 

= A^ +1 = • • • = ^fc +s ( fc )- 

We let Ek be the subspace of L 2 (D) of eigenfunctions of C associated to A k, and for large n 
we let E^ be the subspace of K” generated by all the eigenvectors of e „ corresponding to all 
eigenvalues listed in A^ ,..., A ^ . We remark that by the convergence of the eigenvalues proved 
in Subsection 13.II we have 

(3.15) lim dim(E[ ra ' ) ) = dim(Efe) = s(fc). 

n—> oo 

We prove simultaneously the second and third statement of Theorem 11.21 The proof is by induction 
on fc. 

Base Case: Let fc = 1. Suppose that u n —> u. We need to show that Proj)" (u n ) —y Proj 1 (u). 
Now, note that since the domain D is connected, the multiplicity of Ai is equal to one. In particular, 
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Proj 1 (w) is the function which is identically equal to 

{u, l) p — / udv(x). 

Jd 

On the other hand, thanks to (13.151) . it follows that for all large enopugh n, we have dim(£{”■*) = 1 
(note that in particular this means that assymptotcally the graphs are connected regardless of 
what kernel 77 is being used). Therefore, for large enough n, Proj'j ™' 1 (u n ) is the function which 
is identically equal to (u n ,l)„ n . Proposition 12.61 implies that lim„_ > 00 (u n , l) Vn = (u, l) p and thus 

Proji "' 1 (u n ) ——> Pi'oj 1 (tt) as desired. The second statement of Theorem 11.21 is trivial in this case 
since for large enough n, the only two eigenvectors of £ n ,e n with eigenvalue A ^"' 1 = 0 and with 
|| • -norm equal to one is the function which is identically equal to one or the function that is 
identically equal to — 1 . 

Inductive Step: Now, suppose that the second and third statements of Theorem 11.21 are true for 
1,..., k— 1. We want to prove the result for k. Let j 6 jfc + 1,..., k + s(fc)j. We start by proving 
the second statement of the theorem. Consider .... as in the statement. From (13.71) it follows 

^ 3 ' nGN *- v 

that G nten (Uj) = n 3 E 2 ■ Now, from Subsection HU we know that 

2X[ n) 

lim —- 5 - = a v \j 
n—too ne„ 

and so in particular we have: 

Sup G n ,e n {Uj) < 00 . 
raeN 

Since the norms of the u” are equal to one, the compactness statement from Theorem 11.41 implies 
that {M"} re6N is pre-compact. We have to prove now that every cluster point of {u"} N is an 

TL/2 

eigenfunction of £ with eigenvalue A j. So without the loss of generality let us assume that it" —> Uj 
for some Uj. Our goal is to show that Uj is an eigenfunction of £ with eigenvalue A j. 

By the induction hypothesis, we have Proj-”^(w") ——> Projj(uj) for every i = 1,..., k— 1. On the 

other hand, since Proj^(tt") = 0 for every n € N and for every i = 1 ,..., k — 1 , we conclude that 
Projj(ztj) = 0 for all i = 1, 1. A straightforward computation as in the proof of Proposition 

12.141 shows that: 

OO OO 

(3-16) G(uj) = J]] Ai|| Pro.^(re,-)||p > X k II Pl '°ji( u i)llp = 

i=k i—k 

In addition, since ||u™|| Vn = 1 for all n, we deduce from Proposition 12.61 that ||zij|| p = 1. Thus, 

G(uj) 4^ Afc. 

On the other hand, the liminf inequality from Theorem 11.41 implies that: 

2A^ n) 

o-pAfc = a v Xj = Inn -= lim G nEn (ii J n ) > a v G{u J ) > a v X k . 

n^t-oo n£^ n^f-oo 

Therefore, G(uj) = X k and from (|3. 161) we conclude that || Proj ; (w 7 )||p = 0 for all i ^ k. Thus, Uj 
is an eigenfunction of £ with corresponding eigenvalue A j (= X k )- 

Now we prove the third statement from Theorem 11.21 Suppose that u n —> u. We want to 
show that Projj ,"’ 1 (u n ) Proj fc (zt). To achieve this we prove that for a given sequence of natural 
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numbers there exists a further subsequence for which the convergence holds. We do not relabel 
subsequences to avoid cumbersome notation. 

From (13.151) it follows that for large enough n, dim(^"' ) ) = s(k). Hence, for large enough n, we 
can consider ..., | an orthonormal basis (with respect to the inner product (•, -) Vn ) for 

E^\ where u” is an eigenvector of C n ,e n with corresponding eigenvalue A^.. Now, by the first part 
of the proof, for every j = 1,... ,s(k), the sequence }„ eN is pre-compact in TL 2 . Therefore, 
passing to a subsequence that we do not relabel we can assume that for every j = 1 ,..., s(k ) we 
have: 


(3.17) 


TL ^ 


as n —> oo 


for some Uj £ L 2 {D). From (12.61) . the Uj satisfy \\uj\\ p = 1 for every j and {m, Uj) p = 0 for i ^ j. In 
other words, {rti,... ,u s ^)} is an orthonormal set in L 2 (D) (with respect to (-,-) p ). Furthermore, 
Uj £ Ek for all j by the first part of the proof. In other words, {ui,... is an orthonormal 

basis for Ek and in particular 

s(fc) 

Pr oj k ( u ) = J2( u i u j)p u j- 
j =i 


On the other hand, for large enough n, we have 


s(fc) 

Proji n) (u") = ^{u n ,u™) Vri u™. 

3 = 1 

Finally, the fact that u n —> u and (13.171) combined with Proposition 12.61 imply that 

s(k) s(k ) 

ProjirV") = Y^(u n ,u]) Vn u] —A '£(u,u j ) p u j = Proj fc (u). 

j'=i 3=1 


3.3. Consistency of spectral clustering. Here we prove statement 4. of Theorem 11.21 

The procedure in Algorithm 1, can be reformulated as follows. Let ji n = (u™,... ,Uk)^v n , where 
u ^,are orthonormal eigenvectors of C n ,e n corresponding to eigenvalues x[ r '\ ..., X^'K respec¬ 
tively. Consider the functional F f i r .k- Let z„ be its minimizer, and let G",... G£ be corresponding 
clusters. The clusters Gi ,... ,Gk of Algorithm 1 are defined by Gi = (u”,...,u^) _1 (Gi). 

By Theorem 11.81 the sequence z n is precompact. By Corollary 11.91 the sequence of measures 
H l n = UnLQn is precompact for all i = 1,..., k. Consider a subsequence along which n l n converges 
for every i = 1 ,,k, and denote the limit by //*. Since z l n = f ydfT n (y ) it follows that z l n converge as 
n —> oo, along the same subsequence. By statement 2. of Theorem 11.21 along a further subsequence 
(v n ,u™) converge to (u,Ui) in TL 2 sense for all * = 1,..., k as n —> oo. Furthermore from the 
definition of TL 2 convergence follows that measures /j Tl converge in the Wasserstein sense to p, := 
(ui,... Uk)$v- Combined with convergence of y, l n to fi l implies, via Lemma 12.51 that (n n ,XG n ) 
converge in TL 2 topology to (y,, Xg)- Consequently, by Lemma l2Tl {v n , Xgv-°{ u 1 i • • •, u^)) converge 
t0 (V’XGli ° (ui ,...,u k )) in TL 2 topology. Noting that XG n = X(5 „ o (uj,..., ujj) and XG, = 
Xg ° i u h ■ ■ ■ > u k ) implies that v n \_ G n converges weakly to v \_ G< as desired. 
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4. Convergence of the spectra of normalized graph Laplacians 
We start by proving Theorem 11.61 Recall that for given u n £ L 2 (u n ) 


1 


Gn,e n («n) = - J Wj, 

net ' 


{Xj) _ U n (Xj) \ 


LJ 


y/Du y/'Djj ) 


where Wij = r] £n (xi — Xj) and T>u = Ve n i x i ~ x 0- With a slight abuse of notation we set 

V(xi) := Du. 

For u n £ L 2 {v n ), define u n £ L 2 (y n ) by 


- / \ U n (Xi) . , i 

u n {xi) ■■= . j£ n}. 


( 4 - 1 ) 

v D{Xi)/n 

From the definition of G n ^ n and G, hEn , it follows that G ntSn (u n ) = G n>E „(it n ). Similarly, for every 
u £ L 2 (D) it is true that G{u) = G(-^=). To prove Theorem 11.61 we use the following lemma. 

Lemma 4.1. Assume that the sequence {£ n } ne N satisfies (11.161) . With probability one the following 
statement holds: a sequence {wn} ng i!j, with u n £ L 2 (v n ), converges to u £ L 2 (p) in the TL 2 -metric 


if and only if u, 


TL 


yffhfp 


, where u n is defined in G3D and where f3 v is defined in (11.211) . 


XL 2 TL 2 

Proof. We prove that u n —> u implies u n —> 


\/' hrjP 


; the converse implication is obtained similarly. 


Let be the transportation maps from Proposition !!. 12l which we know exist with probability 

one. Using the change of variables (12.31) we obtain 

V{X t ) 


= / Ve n { x i ~ T n (y))p{y)dy. 
Jd 


ID 

TL' 2 ^ 

If u n —> u, in particular from Proposition 12.11 we have u n o T n —> u. By Proposition 12.11 in 


_ TL 

order to prove that u n — 


=, it is enough to prove that u n o T r 


l\p) 


which in turn is 


r /— , is vkj pun, u, n ^ J. n r -, 

yj Pr/P _ yj Pr)P 

equivalent to u n o T n due to the fact that p satisfies (11.151) . To achieve this, we first 


find an L°°-control on the terms 


yf'DoTn jn 


and then prove that 


yj'DoTn In 


converges point-wise to 


L 1 (D) 

L Since u n oT n —> u this is enough to obtain the desired result. For that purpose, we fix an 

yJPnP 

arbitrary a > 0 and define 77 “ : [ 0 , 00) -A [ 0 , 00) and rj a : [ 0 , 00) — > [ 0 , 00) to be 

77 (f), if t > 2a 


(4.2) 
and 

(4.3) 


V°(t) := 


r/(2a), if t < 2a, 


T(t) := 


r/(f), if t > 2a 
77 ( 0 ), if t < 2a, 

where we recall that 77 is the radial profile of the kernel 77. We let 77“ and 77“ be the isotropic kernels 
whose radial profiles are 77 “ and rj a respectively. Note that thanks to assumption (K2) on 77 , we 
have 77“ < 77 < rj a . Set 

2\\Id-T n \\ 00 


•— £n 


£n •— £r\ 


a 

2\\Id-T n \\ c 
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Note that thanks to the assumptions on e n and the properties of the maps T ni for large enough n, 
i n > 0, —> 1 and —> 1 as n —> oo. In addition, from assumption (K2) on r) and the definitions 

of r] a ,rj a , e n and e„, it is straightforward to check that for large enough n and for Lebesgue almost 
every x,y € D, 


T n (x) - T n (y) 


>V° 


x-y 


and 


T„(x) - T n (y) 


< rj° 



From these inequalities, we conclude that for large enough n and Lebesgue almost every x G D 


( 4 - 4 ) Ve n {T n (x) - T n (y))p(y)dy > j ( x - y)p{y)dy 

and 

( 4 - 5 ) J D Ve„(T„(x) - T n (y))p(y)dy < (^j J^rf£ n (x - y)p(y)dy. 

Given that D is assumed to be a bounded open set with Lipschitz boundary, it is straightforward to 
check that exists a ball B( 0, 9), a cone C with nonempty interior and a family of rotations {R x 
with the property that for every x £ D it is true that x + i?a,(i3(O,0) fl C) C D. For large enough 
n ( so that 1 > i n > 0 ), and for almost every x € D we have: 



y)p{y)dy > m 



y)dy = 


' x-\-i n h€D 


rf{h)dh 


> m 


I R x (B(0,8)nC)' 


rj a {h)dh = to 


/s(o,e)nc" 


r/ a (h)dh > 0, 


where in the first inequality we used assumption (11.151) on p , and we used the change of variables 
h = — to deduce the first equality; to obtain the last equality we used the fact that rj a is radially 

symmetric. From the previous chain of inequalities and from (14.dl) we conclude that for large enough 
n and for almost every x £ D we have 


r] £ri {T n {x) - T n (y))p(y)dy > b > 0 


ID 


for some positive constant b. Form the previous inequality we obtain the desired L°°-control on the 
terms . 1 -. It remains to show that for almost every x £ D, 

y'Co T„/n J 


( 4 -6) hm [ r] En (T n (x) — T n (y))p(y)dy = f3 v p{x). 

Jd 

For this purpose, we use the continuity of p to deduce that for every x € D, 


(4.7) 


lim 

n—too 


V°: n (x - y)p(y)dy 


= 0 , 


where /3 = f Rd rj a (h)dh. Similarly, for every x £ D, 


(4.8) 


lim 

71—^OO 


PMx) 



y)p(y)dy 


= o, 
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where j3 a = J Rd rj a (h)dh. From (14. 41) . we deduce that for large enough n, and for almost every 

iefl, 

Pr,P( x ) ~ [ Ve n {T n (x) - T n (y))p(y)dy </3 v p{x) - (— ) f rg (x - y)p{y)dy 

JD \ £ n/ Jd 71 

- (t 1 ) ( Pip ( x ) - i^p( x ) + P a p( x ) - J d Ve n ( x _ y)p(y) d y 

+ {'- (I )) Mx) - 

Analogously, from (14.51) . for almost every x £ D, 

j^Ve n (T n (x) - T n (y))p(y)dy - P n p(x) < ^4) {p a P{ x ) - P v p(x) + rjf n (x - y)p{y)dy - P a p(x) 

+ ((S) - *) Mx) - 

From these previous inequalities, (TT71) and (14.81) we conclude that for almost every x £ D, 


lim sup 


PvP( x ) - / rj Sn (T n (x) - T n (y))p(y)dy 


id 


< P( x )(P a ~PJ- 


Finally, given that a was arbitrary we can take a —> 0 in the previous expression to deduce that the 
left hand side of the previous expression is actually equal to zero. This establishes m and thus 
the desired result. □ 


The proof of Theorem 11.61 is now straightforward. 

Proof of Theorem 1 1. 61 Liminf inequality: Let u £ L 2 {D) and suppose that {u n } 

TL? f—r— TL ^ 

is such that u n —> u. From Lemma l4.ll we know that u n —> 


nGN’ u n 


\Jhr,P 


£ L 2 (y n ), 

where u n was defined in (14.11) . 


From Theorem 11.41 and the discussion at the beginning of this section, we obtain 


lim inf Gn,e„ (tin) = liminf G„ i£n (u n ) > cr^G 


( -4 =) = ?G(u ), 


\ \JPvP ) Pi 

where the inequality is obtained using the liminf inequality from Theorem 11.41 

Limsup inequality: Let u £ L 2 (D). Since p is bounded below by a positive constant, 


\J PyP 


belongs to L 2 (D) as well. From the limsup inequality in Theorem 11.41 there exists a sequence 


{«"} 


neN’ V n 


£ L 2 {y n ), with 


TL 


y/PyP 


and such that 


lim sup G n ^ en (ti n ) < a v G 


f-O _ 

\ yjPvP J Pi 




Let us consider the function u n £ L 2 {y n ) given by u n (xi) := v n (xi)yjV(xi)/n for i = l,...,n. 

u. From the discussion at the beginning of this section we obtain 


Lemma 14. II implies that u r 


TL 


limsupG„ i£n (u n ) = limsup G nj£n (v n ) < -P-G(u ). 

Pr) 


n—> oo 
2/ 


Compactness: Suppose that {ttn} n6N , u n £ L 2 (v n ), is such that 

SUp 11tin11 z/ n < OO, SUpG ra ,e„(ttn) < OO. 
new new 
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Prom the discussion at the beginning of the section, we deduce that sup ngN G n<Sri (u n ) < oo . Also, 

are uniformly bounded in L°°. This implies that 


from the proof of Lemma l4.ll the terms , 1 

- yjVoT n /n 

sup ngN ||Hn||i 2 (i/„) < oo as well. Hence, we can apply the compactness property from Theorem 11.41 
to conclude that {u™} ngN * s precompact in TL 2 . Using Lemma [4.11 this implies that {u n }„ gN is 
precompact in TL 2 as well. □ 

Proof of Theorem \1.5l Using Theorem 11.61 similar arguments to the ones used in the proof of The¬ 
orem [L2] can be used to establish statements 1., 2., and 3. of Theorem 11.51 

The proof of statement 4. (consistency of spectral clustering) of Theorem 11.51 is analogous to 
the proof of the statement 4. of Theorem 11.21 which is given in Subsection 13.31 The reason 
that the normalization step does not create new difficulties is the following: since the eigenvec¬ 
tors u n := (u”, ..., it))) of Nf v f r f converge in TL 2 to eigenfunctions u = (u”, ..., u£) of J\f sym along 
a subsequence, it can be shown that the normalized vectors u"/||u n || converge to u/||u|| in TL 2 
provided that the set of x £ D for which u = 0 is of jz-measure zero. 

In fact, assuming that v({x £ D : u(x) = 0}) = 0 let us show the TL 2 convergence. From 
the assumption on the set of zeroes of u follows that limy_ >0 + z/({||u(a:)|| < H}) = 0. Let Uh = 
{(x,y) £ D x D : ||u(x)|| < H}. Given n £ N let 7r„ £ Ii{y n ,v) be such that 

\x - y \ 2 + ||u"(x) - u(y)\\ 2 dTr n (x,y) < 2df rL2 (u n ,u). 


Then for any H > 0 
u n u 
|u ,l || ’ Hull 


d T L 2 


< 


jj \x-y\ 2 dn n (x,y) + jj 2 2 dTT n (x,y) 

llll u "(y)||u(x) ± \\u n {y)\\u n {y) - ||u(x)||u”(y)|| 2 

DxD\U h 


+ 


|u(x)|| 2 ||u„(z/)P 


dn n (x,y) 


16 


<4 d^ L2 (u n ,u) + °(H) + —d^ L2 (u n ,u). 


The right hand side can be made arbitrarily small by first picking H small enough and then n large 
enough along the subsequence where u” converges to u. The convergence of normalized eigenvector 
fc-tupples follows. 

To show that v({x £ D : u(x) = 0}) = 0 it suffices to show that the set of x £ D for which 
v}{x ) = 0 has zero Lebesgue measure. To show this, we need the extra technical condition that 
p £ C ll (£>). Because of it and the fact that p is bounded away from zero, it follows from the 
regularity theory of elliptic PDEs, that the function w\ := is of class C 1,a (D) (for a £ (0,1)) 
and is a solution of 


— div(p 2 Vuq) — T\p 2 w\ = 0, Vx £ D. 


Consider the sets 


N(wi) := {x £ D : w\(x) = 0} S(w i) := {x £ N(wi) : Vu;i(x) = 0} . 

By the implicit function theorem, it follows that N(w\)\S(wi) can be covered by at most countable 
d— 1 dimensional manifolds and hence it follows that the Lebesgue measure of N(wi)\S(w\) is equal 
to zero. On the other hand, it follows from the results in EU. that S(wi) is (d— 2)-rectifiable, which 
in particular implies that the Lebesgue measure of S(wi) is equal to zero. Since u)" 1 ({0}) = N(w i), 
we conclude that the set in which u\ is equal to zero has zero Lebesgue measure. □ 


Proof of Corollary H3 Given a sequence {u^} n6N , as in the statement of the corollary, we define 

wl := P 1/2 <. 
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From (HD it follows that w k is an eigenvector of We consider a rescaled version of the 

vectors w k , by setting 

n .,n 1 

= _L£)1/2 M £. 
k V™ V™ 

From the proof of Lemma 14.11 it follows that 

sup Hzhfclk < oo. 

raSN 


Thus, from Theorem 1 1.5 1 up to subsequence, 


~ n TL 2 
W k - > W, 

for some w £ L 2 (D) which is an eigenfunction oiJ\f sym with eigenvalue r k - Hence, up to subsequence, 
from Lemma 10 it follows that 

n TL 2 W 

U k ' 


\[WvP 


By discussion of Subsection 12.41 it follows that 




is an eigenfunction of J\f rw with eigenvalue T&. 


The proof of convergence of clusters is the same as given in the proof of Theorem 11.21 presented 
in Subsection 13.31 □ 
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